SSDM: Scalable Speech Dysfluency Modeling

Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala

Sep-14-2024–arXiv.org Artificial Intelligence

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://eureka235.github.io}.

alignment, arxiv preprint arxiv, gestural score, (15 more...)

arXiv.org Artificial Intelligence

Sep-14-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America
  - United States > Pennsylvania
    - Allegheny County > Pittsburgh (0.04)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Netherlands > Gelderland
    - Nijmegen (0.04)
  - Middle East > Malta
    - Eastern Region > Northern Harbour District > St. Julian's (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
  - Hungary > Budapest
    - Budapest (0.04)
  - Germany > Saxony
    - Dresden (0.04)
- Asia > South Korea
  - Incheon > Incheon (0.04)

Genre:
- Research Report > Promising Solution (0.34)

Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found