SSDM: Scalable Speech Dysfluency Modeling
Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala
–arXiv.org Artificial Intelligence
Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://eureka235.github.io}.
arXiv.org Artificial Intelligence
Sep-14-2024
- Country:
- Asia > South Korea
- Europe
- Germany > Saxony
- Dresden (0.04)
- Hungary > Budapest
- Budapest (0.04)
- Italy > Calabria
- Catanzaro Province > Catanzaro (0.04)
- Middle East > Malta
- Eastern Region > Northern Harbour District > St. Julian's (0.04)
- Netherlands > Gelderland
- Nijmegen (0.04)
- Germany > Saxony
- North America
- Canada > Quebec
- Montreal (0.04)
- United States > Pennsylvania
- Allegheny County > Pittsburgh (0.04)
- Canada > Quebec
- South America > Chile
- Genre:
- Research Report > Promising Solution (0.34)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.68)
- Technology: