fugatto
Nvidia's new AI model can create 'unheard sounds' like never before
Nvidia has been instrumental in the current AI boom that's going on, but primarily as the manufacturer of GPUs that power all the next-gen AI processing tasks. They've gone ahead and joined in the fray with their own AI model that does something truly novel. Reported by Ars Technica, Nvidia's new AI model is called Fugatto and it combines new AI training methods and technologies to transform music, voices, and other sounds in ways that have never been done before, to create soundscapes never before experienced. Fugatto is based on an advanced AI architecture with 2.5 billion parameters, trained on over 50,000 hours of annotated audio data. The model uses a technique called Composable ART (Audio Representation Transformation), which can combine and control different sound properties based on text or audio prompts.
NVIDIA's new AI model Fugatto can create audio from text prompts
NVIDIA has debuted a new experimental generative AI model, which it describes as "a Swiss Army knife for sound." The model called Foundational Generative Audio Transformer Opus 1, or Fugatto, can take commands from text prompts and use them to create audio or to modify existing music, voice and sound files. It was designed by a team of AI researchers from around the world, and NVIDIA says that made the model's "multi-accent and multilingual capabilities stronger." "We wanted to create a model that understands and generates sound like humans do," said Rafael Valle, one of the researchers behind the project and a manager of applied audio research at NVIDIA. The company listed some possible real-world scenarios wherein Fugatto could be of use in its announcement. Music producers, it suggested, could use the technology to quickly generate a prototype for a song idea, which they can then easily edit to try out different styles, voices and instruments.