Riffusion's AI generates music from text using visual sonograms

Dec-17-2022, 09:25:29 GMT–#artificialintelligence

On Thursday, a pair of tech hobbyists released Riffusion, an AI model that generates music from text prompts by creating a visual representation of sound and converting it to audio for playback. It uses a fine-tuned version of the Stable Diffusion 1.5 image synthesis model, applying visual latent diffusion to sound processing in a novel way. Created as a hobby project by Seth Forsgren and Hayk Martiros, Riffusion works by generating sonograms, which store audio in a two-dimensional image. In a sonogram, the X-axis represents time (the order in which the frequencies get played, from left to right), and the Y-axis represents the frequency of the sounds. Meanwhile, the color of each pixel in the image represents the amplitude of the sound at that given moment in time.

music, riffusion, sonogram, (10 more...)

#artificialintelligence

Dec-17-2022, 09:25:29 GMT

News Web Page

Add feedback

Industry:
- Media > Music (0.78)
- Leisure & Entertainment (0.78)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found