Stable Audio Open

Evans, Zach, Parker, Julian D., Carr, CJ, Zukowski, Zack, Taylor, Josiah, Pons, Jordi

Jul-31-2024–arXiv.org Artificial Intelligence

Open generative models are vitally important for the community, allowing for fine-tunes and serving as baselines when presenting new models. However, most current text-to-audio models are private and not accessible for artists and researchers to build upon. Here we describe the architecture and training process of a new open-weights text-to-audio model trained with Creative Commons data. Our evaluation shows that the model's performance is competitive with the state-of-the-art across various metrics. Notably, the reported FDopenl3 results (measuring the realism of the generations) showcase its potential for high-quality stereo sound synthesis at 44.1kHz.

arxiv, stable audio 2, training data, (13 more...)

arXiv.org Artificial Intelligence

Jul-31-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Industry:
- Media (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.98)
  - Natural Language (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found