High-Fidelity Audio Compression with Improved RVQGAN

Dec-25-2025, 10:12:47 GMT–Neural Information Processing Systems

Language models have been successfully used to model natural signals, such as images, speech, and music. A key component of these models is a high quality neural compression model that can compress high-dimensional natural signals into lower dimensional discrete tokens. To that end, we introduce a high-fidelity universal neural audio compression algorithm that achieves ~90x compression of 44.1 KHz audio into tokens at just 8kbps bandwidth. We achieve this by combining advances in high-fidelity audio generation with better vector quantization techniques from the image domain, along with improved adversarial and reconstruction losses.

high-fidelity audio compression, improved rvqgan, name change, (7 more...)

Neural Information Processing Systems

Dec-25-2025, 10:12:47 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.40)