Ghostboard pixel

Docling: IBM's Latest Enterprise-Focused Open Source Offering for Generative AI

It's good to see open source toolkits for making generative AI.

IBM has a rich history in the tech space, with the company constantly evolving according to global trends and pushing for the next big thing in the industry. Unsurprisingly, the AI craze has not gone unnoticed by them.

They recently introduced the Granite 3.0 family of open source LLMs, along with updates to RHEL AI, their enterprise-grade AI platform. But, it appears they are not done, as they have officially launched Docling, a long-running project of theirs (initially released back in July 2024).

Docling: What To Expect?

an illustration that shows the process flow of docling
The Docling process flow. (Source: GitHub)

Developed as an open source toolkit under the MIT License, Docling is a tool for extracting documents and exporting them into Markdown and JSON for easy reading by large language models (LLMs) and foundation models.

It is powered by two models that were designed by IBM researchers. The first is the vision model, which employs object-detection techniques to figure out the layout of a page in a document, then subsequently identifying and classifying blocks of text, images, tables, etc.

The other is TableFormer, which is meant to convert image-based tables into machine-readable formats with rows and columns of cells.

To showcase its ability to perform, IBM has stated that they have tested it extensively, with the researchers behind InstructLab using Docling to extract information from PDFs to train InstructLab's underlying AI models.

They used it for analyzing 2.1 million PDFs from Common Crawl, converting the raw data into usable AI training data. Moreover, there are plans to go through a whopping 1.8 billion PDF files and feeding the extracted data into a future version of Granite.

Docling is targeted for enterprise use, where crunching a large amount of data and organizing it properly is crucial. It can be used to process things like technical manuals, user guides, specifications, legal documents, and really any other structured document.

A typical use case for it would be to train internal AI models on company data to aid in knowledge sharing and optimizing workflows. The IBM researchers are, in fact, planning to introduce support for more complex data types such as math equations, charts, etc.

Want To Check It Out?

Docling is equipped with a command-line interface and a Python API, and it has been optimized to run on conventional laptops. IBM claims that it only takes five lines of code to configure/integrate it with open source LLM frameworks like LlamaIndex and LangChain.

You can get started with Docling by going through its documentation and the GitHub repository. If you would like to learn more about it, the announcement blog is definitely worth a read.

Suggested Read 📖

Red Hat’s AI Upgrade + IBM’s New Open Source LLM: What Should You Know?
It’s time for some AI-focused releases for IBM.

Great! You’ve successfully signed up.

Welcome back! You've successfully signed in.

You've successfully subscribed to It's FOSS News.

Success! Check your email for magic link to sign-in.

Success! Your billing info has been updated.

Your billing was not updated.