SSDM: Scalable Speech Dysfluency Modeling
–Neural Information Processing Systems
Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions [1, 2] suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose SSDM: Scalable Speech Dysfluency Modeling, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a largescale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling.
Neural Information Processing Systems
Mar-27-2025, 03:21:58 GMT
- Country:
- Europe
- Germany (0.14)
- Middle East > Malta (0.14)
- Netherlands (0.14)
- North America > United States (0.14)
- Europe
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Education (0.66)
- Health & Medicine > Therapeutic Area
- Neurology (0.67)
- Information Technology (0.92)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.93)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Speech > Speech Recognition (1.00)
- Vision (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence