High-Fidelity Music Vocoder using Neural Audio Codecs

Lanzendörfer, Luca A., Grötschla, Florian, Ungersböck, Michael, Wattenhofer, Roger

Feb-18-2025–arXiv.org Artificial Intelligence

-- While neural vocoders have made significant progress in high-fidelity speech synthesis, their application on polyphonic music has remained underexplored. In this work, we propose DisCoder, a neural vocoder that leverages a generative adversarial encoder-decoder architecture informed by a neural audio codec to reconstruct high-fidelity 44.1 kHz audio from mel spectrograms. Our approach first transforms the mel spectrogram into a lower-dimensional representation aligned with the Descript Audio Codec (DAC) latent space before reconstructing it to an audio signal using a fine-tuned DAC decoder . DisCoder achieves state-of-the-art performance in music synthesis on several objective metrics and in a MUSHRA listening study. Our approach also shows competitive performance in speech synthesis, highlighting its potential as a universal vocoder .

artificial intelligence, machine learning, speech synthesis, (14 more...)

arXiv.org Artificial Intelligence

Feb-18-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.15)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Media > Music (0.35)
- Leisure & Entertainment (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks (0.69)
  - Speech > Speech Synthesis (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found