Catch-A-Waveform: Learning to Generate Audio from a Single Short Example

Jan-18-2025, 16:55:22 GMT–Neural Information Processing Systems

Models for audio generation are typically trained on hours of recordings. Here, we illustrate that capturing the essence of an audio source is typically possible from as little as a few tens of seconds from a single training signal. Specifically, we present a GAN-based generative model that can be trained on one short audio signal from any domain (e.g. Once trained, our model can generate random samples of arbitrary duration that maintain semantic similarity to the training waveform, yet exhibit new compositions of its audio primitives. This enables a long line of interesting applications, including generating new jazz improvisations or new a-cappella rap variants based on a single short example, producing coherent modifications to famous songs (e.g. We show that in all cases, no more than 20 seconds of training audio commonly suffice for our model to achieve state-of-the-art results.

catch-a-waveform, generate audio, single short example, (2 more...)

Neural Information Processing Systems

Jan-18-2025, 16:55:22 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (0.66)
  - Natural Language (0.62)