Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech

Hajal, Karl El, Hermann, Enno, Hovsepyan, Sevada, -Doss, Mathew Magimai.

Jun-3-2025–arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) systems struggle with dysarthric speech due to high inter-speaker variability and slow speaking rates. To address this, we explore dysarthric-to-healthy speech conversion for improved ASR performance. Our approach extends the Rhythm and Voice (RnV) conversion framework by introducing a syllable-based rhythm modeling method suited for dysarthric speech. We assess its impact on ASR by training LF-MMI models and fine-tuning Whisper on converted speech. Experiments on the Torgo corpus reveal that LF-MMI achieves significant word error rate reductions, especially for more severe cases of dysarthria, while fine-tuning Whisper on converted data has minimal effect on its performance. These results highlight the potential of unsupervised rhythm and voice conversion for dysarthric ASR. Code available at: https://github.com/idiap/RnV

artificial intelligence, machine learning, speech, (17 more...)

arXiv.org Artificial Intelligence

Jun-3-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland > Vaud > Lausanne (0.05)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (0.69)
    - Performance Analysis > Accuracy (0.34)
  - Speech > Speech Recognition (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found