Wave-U-Mamba: An End-To-End Framework For High-Quality And Efficient Speech Super Resolution

Sep-17-2024–arXiv.org Artificial Intelligence

Speech Super-Resolution (SSR) is a task of enhancing low-resolution speech signals by restoring missing high-frequency components. Conventional approaches typically reconstruct log-mel features, followed by a vocoder that generates high-resolution speech in the waveform domain. However, as log-mel features lack phase information, this can result in performance degradation during the reconstruction phase. Motivated by recent advances with Selective State Spaces Models (SSMs), we propose a method, referred to as Wave-U-Mamba that directly performs SSR in time domain. In our comparative study, including models such as WSRGlow, NU-Wave 2, and AudioSR, Wave-U-Mamba demonstrates superior performance, achieving the lowest Log-Spectral Distance (LSD) across various low-resolution sampling rates, ranging from 8 kHz to 24 kHz. Additionally, subjective human evaluations, scored using Mean Opinion Score (MOS) reveal that our method produces SSR with natural and human-like quality. Furthermore, Wave-U-Mamba achieves these results while generating high-resolution speech over nine times faster than baseline models on a single A100 GPU, with parameter sizes less than 2% of those in the baseline models.

architecture, spectrogram, wave-u-mamba, (15 more...)

arXiv.org Artificial Intelligence

Sep-17-2024

arXiv.org PDF

Add feedback

Country:
- Asia > South Korea > Seoul > Seoul (0.04)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine (0.58)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (0.90)
  - Machine Learning > Neural Networks (0.69)
  - Representation & Reasoning (0.68)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found