Ghostboard pixel Skip to content

Is The AI Open Source? OSI Gives a New Definition to Help Us Know That!

The OSI finally makes a new definition for open source AI systems, encouraging organizations to do more instead of slapping the term "open source."

AI models have taken the tech world by storm, with the underlying systems behind the most popular ones being an enigma because of the various companies' reluctance to completely disclose their source.

Usually, they seem to imply that a competitor might gain an advantage by using their tech. However, there is a more clear-cut issue at hand here.

The training methods and data used to train such proprietary and open weight models are never really shared openly, and we already know that there are plenty of copyrighted/IP-protected pieces of data in the outputs of such models.

The Open Source Initiative (OSI) had called out Meta on this recently, as they market the Llama family of models as open source, which, in reality, is open weight at best.

Now, the OSI has introduced the first version of the long-awaited Open Source AI Definition (OSAID), which aims to tackle such issues by defining the concept explicitly.

Open Source AI Definition: What To Expect?

a screenshot of the open source ai definition webpage
Just a stand in image of the OSAID webpage.

Worked on by a diverse mix of organizations and individuals, the first version of the OSAID has been drafted jointly.

Organizations involved include the Open Knowledge Foundation, Wikimedia Foundation, Mozilla Foundation, Hugging Face, Amazon, Microsoft, Meta, and many others. (OpenAI doesn't seem to have collaborated)

Without going too much into the technical aspects of the OSAID, here are some key points of the definition that you should be aware of:

  • Anyone can use an open source AI system for any use case, study how it works, modify it according to one's needs, and should be able to share it freely with/without modifications for any purpose.
  • There is a precondition for the above, which is to have access to a “preferred form” to make modifications to the AI system.
  • The “preferred form” must include things like having access to the complete description of the data used for training, disclosure of the provenance of the data, listing of all publicly available data, and the source code used to train the AI system. Even the model parameter or configuration settings used are to be provided.

To summarize, the definition covers all “fully functional structure and its discrete structural elements” under its purview. This includes things like the model, the weights, the parameters, etc.

This means that to meet the OSAID standards, an AI model's entire architecture, along with the smaller components that make it function, must be accessible and modifiable by anyone.

Additionally, the term “AI system” broadly covers any machine-based system that can take in inputs and generate outputs that affect physical or virtual environments for both explicit and implicit objectives.

You can learn more about OSAID by going through the definition itself.

Closing Thoughts

There is an inescapable pain point here, which is that there is no clause that stipulates the training data itself be open sourced. This has led to many people questioning whether this definition truly encompasses the fundamental principles of open source when such a major part is left out.

The OSI is said to be working on updates and defining rules for the maintenance of the OSAID, but they have not clarified how they intend to handle the matter of open sourcing the training data.

But, of course, having this definition is better than not having a standard.

In any case, you can learn more about the OSAID by going through the deepdive published by the OSI, which shows an overview of the processes and governance-related aspects of the initiative.

💬 What are your views on OSAID? Do you think they could have done a better job?


Here's why you should opt for It's FOSS Plus Membership

  • Even the biggest players in the Linux world don't care about desktop Linux users. We do.
  • We don't put content behind paywall. Your support keeps it open for everyone. Think of it like 'pay it forward'.
  • Don't like ads? With the Plus membership, you get an ad-free reading experience.
  • When millions of AI-generated content is being published daily, you read and learn from real human Linux users.
  • It costs just $2 a month, less than the cost of your favorite burger.

Latest