Google is a name that most of us are familiar with. Even though they are known for making the headlines, they do support an impressive lineup of open-source projects that have shaped how we experience the internet today.
Now, with the launch of their AI Cyber Defense Initiative, they have open-sourced Magika; their AI-powered file-type identification tool, in a bid to help others take advantage of its capabilities, and build upon it.
Suggested Read 📖
Magika: What Is It?
Magika is a tool that can be used to detect the most commonly used file types such as PNG, JPG, PDF, APK, and quite a few others by using the power of artificial intelligence.
Google claims that it can easily outperform traditional tools and methods of file identification, with an average precision level of over 99%. The most obvious use case of this would be in the field of cybersecurity, but, more on that later.
Magika isn't something that just appeared out of thin air, Google had been using it internally with Gmail, Drive, and Safe Browsing for forwarding files to the relevant security and content policy scanners.
All of that was possible, thanks to the implementation of a custom, highly optimized deep-learning model that has been tailored and trained using Keras that weighs ~1 MB.
Inference times are also quite fast thanks to Onnx, which ensures fast operations in just a matter of milliseconds; similar to non-AI tools, even when using a CPU.
They also shared some helpful benchmarks that compared Magika against other tools, and the average F1 score resulted in about a 20% uptick in performance when pitched against other tools on a 1M files benchmark with over 100 file types.
Helping the Cybersecurity Game
A tool like Magika can be a very potent thing to have by your side, as file scanning at such speeds was previously unheard of. Open sourcing this has opened the door for many security-focused services and products to use this as a reliable component in providing better security to their customers.
Google has themselves already begun work on integrating Magika into VirusTotal; the online service which they acquired in 2012. It helps analyze suspicious files and URLs.
And, with Magika AI integration, they plan to further bolster its existing Code Insight functionality.
The official announcement blog has more details if you are up for it, and stick around a bit longer to learn how to try Magika.
How Can You Try It?
The most straightforward way for trying out Magika is the demo hosted on the official website. As you can see above, it can easily distinguish file types for multiple uploaded files.
If you want to run it locally, or on a server, then you can install it as a Python package:
pip install magika
Then, run it using the following command to start it:
magika
For command examples, or official documentation, I highly suggest you give Magika's GitHub repo a visit. Though, at the time of writing, it had a weird disclaimer at the bottom that said 😐
This project is not an official Google project. It is not supported by Google and Google specifically disclaims all warranties as to its quality, merchantability, or fitness for a particular purpose.
My best bet is that either this is a mistake, or I am missing something here. Anyway, only time will tell which one is the correct assumption.
Another interesting bit 💡
When asked during a discussion over at Hacker News — why they released a Node module for Magika, one of the co-authors, Elie Bursztein said that:
We did release the npm package because indeed we create a web demo and thought people might want to also use it. We know it is not as fast as the python version or a C++ version – which why we did mark it as experimental.
The release include the python package and the cli which are quite fast and is the main way we did expect people to use – sorry if that hasn't be clear in the post.
There also seems to be plans for a .deb and similar packages that I spotted on one of the newly created issues on the Magika repo. It is nice to see that they intend to support Linux in more ways than one.
💬 What do you think of this move by Google? Was open-sourcing such a tool the right call?
Here's why you should opt for It's FOSS Plus Membership
- Even the biggest players in the Linux world don't care about desktop Linux users. We do.
- We don't put content behind paywall. Your support keeps it open for everyone. Think of it like 'pay it forward'.
- Don't like ads? With the Plus membership, you get an ad-free reading experience.
- When millions of AI-generated content is being published daily, you read and learn from real human Linux users.
- It costs just $2 a month, less than the cost of your favorite burger.