Exploring ASR-Based Wav2Vec2 for Automated Speech Disorder Assessment: Insights and Analysis

Nguyen, Tuan, Fredouille, Corinne, Ghio, Alain, Balaguer, Mathieu, Woisard, Virginie

arXiv.org Artificial Intelligence 

Some automatic systems have ASR-based model has been fine-tuned for automated speech shown robust performance and stability by learning from expert disorder quality assessment tasks, yielding impressive results decisions [6, 7]. and setting a new baseline for Head and Neck Cancer speech contexts. This demonstrates that the ASR dimension from In 2024, Nguyen et al. [8] introduced a system that Wav2Vec2 closely aligns with assessment dimensions. Despite leverages the Automatic Speech Recognition (ASR) based its effectiveness, this system remains a black box with Wav2Vec2 model [9], known for its strong capability in no clear interpretation of the connection between the model learning speech representations. This approach compared ASR dimension and clinical assessments. This paper presents self-supervised learning (SSL) and the ASR dimension for the first analysis of this baseline model for speech quality assessment, speech quality assessment. It is shown that the fine-tuning focusing on intelligibility and severity tasks.