In recent times, AI-powered tools have managed to break into the audio production space, with many options offering something that sets it apart from its competition.
Stability AI is one such popular generative AI company that offers a range of AI tools for creating images, audio, and videos with ease. They have recently announced an open-source AI text-to-audio model under the “Stable Audio Open” moniker, which is geared towards creators.
Let's find out what it can do.
Stable Audio Open: What to Expect?
When implemented, users can generate up to 47 seconds of high-quality audio data from a text prompt. It specializes in creating instrument riffs, ambient sounds, drum beats, Foley recordings and more.
It has been made open-source under the Stability AI Non-Commercial Research Community License that prohibits any form of commercial usage, and users are required to accept Stability AI's privacy policy too.
Users also have the freedom to fine-tune the model by using their custom audio data, Stability AI gave an interesting example:
A drummer could fine-tune on samples of their own drum recordings to generate new beats.
As for the model's inner workings, it features 1.21 billion parameters with three key components, an autoencoder, a T5-based text embedding, and a transformer-based diffusion model (DiT).
For the datasets, it uses a total of 486492 audio recordings, of which 472,618 are from Freesound and 13,874 from the Free Music Archive (FMA). All of these are licensed under CC0, CC BY, or CC Sampling+.
To ensure there was no copyrighted audio in the datasets, they sent identified music samples from Freesound to Audible Magic for checking the presence of copyrighted music.
They did find some, but those were removed before training began. In the case of the FMA subset, they used a different method to check for copyrighted content by performing a metadata search against a large database of copyrighted music.
Stability AI also pointed out that Stable Audio Open is different from their commercial product, Stable Audio, which allows subscribers to generate 3 minutes of high-quality tracks with a coherent music structure, and some other advanced capabilities.
So, to wrap things up, I will say this: Stable Audio Open is a great option for individual users who are into audio production, or just want to mess around and see what kind of interesting output they can generate.
You can learn more about it by going through the announcement blog.
Get Stable Audio Open
Stability AI has uploaded the Stable Audio Open model weights on Hugging Face, and are encouraging professionals to explore it as well as provide feedback.
💬 In a sea of AI models, this one tries to offer something distinctive I think. Have you come across such interesting ones? Let me know!
Here's why you should opt for It's FOSS Plus Membership
- Even the biggest players in the Linux world don't care about desktop Linux users. We do.
- We don't put content behind paywall. Your support keeps it open for everyone. Think of it like 'pay it forward'.
- Don't like ads? With the Plus membership, you get an ad-free reading experience.
- When millions of AI-generated content is being published daily, you read and learn from real human Linux users.
- It costs just $2 a month, less than the cost of your favorite burger.