Thanks to the advent of AI-based solutions, managing data properly and securely for such implementations at scale has become a significant challenge. Many new career paths have opened up due to this, and even organizations that have been established for catering to such requirements.
Databricks is one such organization that was founded back in 2013 by the creators of Apache Spark, Delta Lake and MLflow which claims to be the “world’s first and only lakehouse platform in the cloud”.
In a recent event, they announced their latest move toward embracing open-source that will see them opening up a very popular tool of theirs.
A Big Announcement: What To Expect?
During their annual Data + AI Summit event, Ali Ghodsi, CEO of Databricks announced (16:00 timestamp) that Unity Catalog, their solution for governance of data and AI, was going open source under the “Unity Catalog OSS” name.
Sadly, he never mentioned which license it would go open-source under, but, they mentioned it includes an Apache 2.0 licensed open-source server. So, I guess, that's the one.
After its launch, Unity Catalog OSS is set to act as a “universal interface” with support for any data format and compute engine.
The developers note that it has support to read tables with Delta Lake, Apache Iceberg, and Apache Hudi clients via Delta Lake UniForm. There's also support for the Iceberg REST Catalog and Hive Metastore (HMS) interface standards.
In reaction to this move, many of Databricks' customers such as AWS, AT&T, Google, Rivian, NVIDIA, and more added their views on this.
One such view was by Jessica Hawk, CVP, Data, AI, Digital Applications at Microsoft, who said that:
Microsoft is committed to the open-source community and empowering customers with choice. Databricks has been a strategic partner for years and it's great to see them open-sourcing Unity Catalog. We believe truly open standards with broad industry participation are in customers' best interests.
As for those who aren't familiar, Unity Catalog is a governance solution that is used for managing data, both structured and unstructured in any format, machine learning models, notebooks, and more.
It also provides users with helpful dashboards to better manage their stack, AI-powered monitoring, with many users around taking advantage of its “single permission model” for straightforward access management.
Want to Check it Out?
Well, at the time of writing, the code for Unity Catalog OSS was still not up on the Databricks GitHub repo.
However, the company has clarified that it will be made available as part of Matei Zaharia's keynote during an upcoming session in the summit on Thursday, where we should be able to explore more about it.
Keep an eye out on the repo to take a look at it when it's available on their repo.
If you are eager to learn more about this move, then you can refer to the official announcement blog, and documentation.
More from It's FOSS...
- Support us by opting for It's FOSS Plus membership.
- Join our community forum.
- 📩 Stay updated with the latest on Linux and Open Source. Get our weekly Newsletter.