Linux Kernel 6.13 Released: Here's What's New!
AMD users and old Apple device owners, this is a good release for you!
The OSI finally makes a new definition for open source AI systems, encouraging organizations to do more instead of slapping the term "open source."
AI models have taken the tech world by storm, with the underlying systems behind the most popular ones being an enigma because of the various companies' reluctance to completely disclose their source.
Usually, they seem to imply that a competitor might gain an advantage by using their tech. However, there is a more clear-cut issue at hand here.
The training methods and data used to train such proprietary and open weight models are never really shared openly, and we already know that there are plenty of copyrighted/IP-protected pieces of data in the outputs of such models.
The Open Source Initiative (OSI) had called out Meta on this recently, as they market the Llama family of models as open source, which, in reality, is open weight at best.
Now, the OSI has introduced the first version of the long-awaited Open Source AI Definition (OSAID), which aims to tackle such issues by defining the concept explicitly.
Worked on by a diverse mix of organizations and individuals, the first version of the OSAID has been drafted jointly.
Organizations involved include the Open Knowledge Foundation, Wikimedia Foundation, Mozilla Foundation, Hugging Face, Amazon, Microsoft, Meta, and many others. (OpenAI doesn't seem to have collaborated)
Without going too much into the technical aspects of the OSAID, here are some key points of the definition that you should be aware of:
To summarize, the definition covers all βfully functional structure and its discrete structural elementsβ under its purview. This includes things like the model, the weights, the parameters, etc.
This means that to meet the OSAID standards, an AI model's entire architecture, along with the smaller components that make it function, must be accessible and modifiable by anyone.
Additionally, the term βAI systemβ broadly covers any machine-based system that can take in inputs and generate outputs that affect physical or virtual environments for both explicit and implicit objectives.
You can learn more about OSAID by going through the definition itself.
There is an inescapable pain point here, which is that there is no clause that stipulates the training data itself be open sourced. This has led to many people questioning whether this definition truly encompasses the fundamental principles of open source when such a major part is left out.
The OSI is said to be working on updates and defining rules for the maintenance of the OSAID, but they have not clarified how they intend to handle the matter of open sourcing the training data.
But, of course, having this definition is better than not having a standard.
In any case, you can learn more about the OSAID by going through the deepdive published by the OSI, which shows an overview of the processes and governance-related aspects of the initiative.
π¬ What are your views on OSAID? Do you think they could have done a better job?
Stay updated with relevant Linux news, discover new open source apps, follow distro releases and read opinions