Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection

Guo, Chenxu, Lian, Jiachen, Zhou, Xuanru, Zhang, Jinming, Li, Shuhe, Ye, Zongli, Park, Hwi Joo, Das, Anaisha, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Bogley, Rian, Wauters, Lisa, Miller, Zachary, Gorno-Tempini, Maria, Anumanchipalli, Gopala

May-27-2025–arXiv.org Artificial Intelligence

Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency. Unlike previous models, Dysfluent-WFST operates with upstream encoders like WavLM and requires no additional training. It achieves state-of-the-art performance in both phonetic error rate and dysflu-ency detection on simulated and real speech data. Our approach is lightweight, interpretable, and effective, demonstrating that explicit modeling of pronunciation behavior in decoding, rather than complex architectures, is key to improving dys-fluency processing systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

May-27-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Therapeutic Area (0.54)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found