IBM has a rich history in the tech space, with the company constantly evolving according to global trends and pushing for the next big thing in the industry. Unsurprisingly, the AI craze has not gone unnoticed by them.
They recently introduced the Granite 3.0 family of open source LLMs, along with updates to RHEL AI, their enterprise-grade AI platform. But, it appears they are not done, as they have officially launched Docling, a long-running project of theirs (initially released back in July 2024).
Docling: What To Expect?
Developed as an open source toolkit under the MIT License, Docling is a tool for extracting documents and exporting them into Markdown and JSON for easy reading by large language models (LLMs) and foundation models.
It is powered by two models that were designed by IBM researchers. The first is the vision model, which employs object-detection techniques to figure out the layout of a page in a document, then subsequently identifying and classifying blocks of text, images, tables, etc.
The other is TableFormer, which is meant to convert image-based tables into machine-readable formats with rows and columns of cells.
To showcase its ability to perform, IBM has stated that they have tested it extensively, with the researchers behind InstructLab using Docling to extract information from PDFs to train InstructLab's underlying AI models.
They used it for analyzing 2.1 million PDFs from Common Crawl, converting the raw data into usable AI training data. Moreover, there are plans to go through a whopping 1.8 billion PDF files and feeding the extracted data into a future version of Granite.
Docling is targeted for enterprise use, where crunching a large amount of data and organizing it properly is crucial. It can be used to process things like technical manuals, user guides, specifications, legal documents, and really any other structured document.
A typical use case for it would be to train internal AI models on company data to aid in knowledge sharing and optimizing workflows. The IBM researchers are, in fact, planning to introduce support for more complex data types such as math equations, charts, etc.
Want To Check It Out?
Docling is equipped with a command-line interface and a Python API, and it has been optimized to run on conventional laptops. IBM claims that it only takes five lines of code to configure/integrate it with open source LLM frameworks like LlamaIndex and LangChain.
You can get started with Docling by going through its documentation and the GitHub repository. If you would like to learn more about it, the announcement blog is definitely worth a read.
Suggested Read 📖
Here's why you should opt for It's FOSS Plus Membership
- Even the biggest players in the Linux world don't care about desktop Linux users. We do.
- We don't put content behind paywall. Your support keeps it open for everyone. Think of it like 'pay it forward'.
- Don't like ads? With the Plus membership, you get an ad-free reading experience.
- When millions of AI-generated content is being published daily, you read and learn from real human Linux users.
- It costs just $2 a month, less than the cost of your favorite burger.