AITopics | dysfluency

Collaborating Authors

dysfluency

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SSDM: Scalable Speech Dysfluency Modeling

Neural Information Processing SystemsFeb-17-2026, 17:27:49 GMT

However, there are three challenges.

large language model, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

SSDM: Scalable Speech Dysfluency Modeling

Neural Information Processing SystemsOct-10-2025, 14:29:34 GMT

However, there are three challenges.

alignment, arxiv preprint arxiv, gestural score, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Information Technology (0.92)
Health & Medicine > Therapeutic Area > Neurology (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

StutterCut: Uncertainty-Guided Normalised Cut for Dysfluency Segmentation

Ghosh, Suhita, Jouaiti, Melanie, Perschewski, Jan-Ole, Stober, Sebastian

arXiv.org Artificial IntelligenceAug-5-2025

Detecting and segmenting dysfluencies is crucial for effective speech therapy and real-time feedback. However, most methods only classify dysfluencies at the utterance level. We introduce StutterCut, a semi-supervised framework that formulates dysfluency segmentation as a graph partitioning problem, where speech embeddings from overlapping windows are represented as graph nodes. We refine the connections between nodes using a pseudo-oracle classifier trained on weak (utterance-level) labels, with its influence controlled by an uncertainty measure from Monte Carlo dropout. Additionally, we extend the weakly labelled FluencyBank dataset by incorporating frame-level dysfluency boundaries for four dysfluency types. This provides a more realistic benchmark compared to synthetic datasets. Experiments on real and synthetic datasets show that StutterCut outperforms existing methods, achieving higher F1 scores and more precise stuttering onset detection.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2508.02255

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.88)

Add feedback

Seamless Dysfluent Speech Text Alignment for Disordered Speech Analysis

Ye, Zongli, Lian, Jiachen, Zhou, Xuanru, Zhang, Jinming, Li, Haodong, Li, Shuhe, Guo, Chenxu, Das, Anaisha, Park, Peter, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Bogley, Rian, Wauters, Lisa, Miller, Zachary, Gorno-Tempini, Maria, Anumanchipalli, Gopala

arXiv.org Artificial IntelligenceJun-17-2025

Accurate alignment of dysfluent speech with intended text is crucial for automating the diagnosis of neurodegenerative speech disorders. Traditional methods often fail to model phoneme similarities effectively, limiting their performance. In this work, we propose Neural LCS, a novel approach for dysflu-ent text-text and speech-text alignment. Neural LCS addresses key challenges, including partial alignment and context-aware similarity mapping, by leveraging robust phoneme-level modeling. We evaluate our method on a large-scale simulated dataset, generated using advanced data simulation techniques, and real PP A data. Neural LCS significantly outperforms state-of-the-art models in both alignment accuracy and dysfluent speech segmentation. Our results demonstrate the potential of Neural LCS to enhance automated systems for diagnosing and analyzing speech disorders, offering a more accurate and linguistically grounded solution for dysfluent speech alignment.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

2506.12073

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.68)
Research Report > New Finding (0.68)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.68)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Dysfluent WFST: A Framework for Zero-Shot Speech Dysfluency Transcription and Detection

Guo, Chenxu, Lian, Jiachen, Zhou, Xuanru, Zhang, Jinming, Li, Shuhe, Ye, Zongli, Park, Hwi Joo, Das, Anaisha, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Bogley, Rian, Wauters, Lisa, Miller, Zachary, Gorno-Tempini, Maria, Anumanchipalli, Gopala

arXiv.org Artificial IntelligenceMay-27-2025

Automatic detection of speech dysfluency aids speech-language pathologists in efficient transcription of disordered speech, enhancing diagnostics and treatment planning. Traditional methods, often limited to classification, provide insufficient clinical insight, and text-independent models misclassify dysfluency, especially in context-dependent cases. This work introduces Dysfluent-WFST, a zero-shot decoder that simultaneously transcribes phonemes and detects dysfluency. Unlike previous models, Dysfluent-WFST operates with upstream encoders like WavLM and requires no additional training. It achieves state-of-the-art performance in both phonetic error rate and dysflu-ency detection on simulated and real speech data. Our approach is lightweight, interpretable, and effective, demonstrating that explicit modeling of pronunciation behavior in decoding, rather than complex architectures, is key to improving dys-fluency processing systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.16351

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Therapeutic Area (0.54)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Time and Tokens: Benchmarking End-to-End Speech Dysfluency Detection

Zhou, Xuanru, Lian, Jiachen, Cho, Cheol Jun, Liu, Jingwen, Ye, Zongli, Zhang, Jinming, Morin, Brittany, Baquirin, David, Vonk, Jet, Ezzes, Zoe, Miller, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala

arXiv.org Artificial IntelligenceSep-20-2024

Speech dysfluency modeling is a task to detect dysfluencies in speech, such as repetition, block, insertion, replacement, and deletion. Most recent advancements treat this problem as a time-based object detection problem. In this work, we revisit this problem from a new perspective: tokenizing dysfluencies and modeling the detection problem as a token-based automatic speech recognition (ASR) problem. We propose rule-based speech and text dysfluency simulators and develop VCTK-token, and then develop a Whisper-like seq2seq architecture to build a new benchmark with decent performance. We also systematically compare our proposed token-based methods with time-based methods, and propose a unified benchmark to facilitate future research endeavors. We open-source these resources for the broader scientific community. The project page is available at https://rorizzz.github.io/

detection, dysfluency, speech, (16 more...)

arXiv.org Artificial Intelligence

2409.13582

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > South Korea > Incheon > Incheon (0.04)

Genre: Research Report (0.64)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

YOLO-Stutter: End-to-end Region-Wise Speech Dysfluency Detection

Zhou, Xuanru, Kashyap, Anshul, Li, Steve, Sharma, Ayati, Morin, Brittany, Baquirin, David, Vonk, Jet, Ezzes, Zoe, Miller, Zachary, Tempini, Maria Luisa Gorno, Lian, Jiachen, Anumanchipalli, Gopala Krishna

arXiv.org Artificial IntelligenceSep-15-2024

Dysfluent speech detection is the bottleneck for disordered speech analysis and spoken language learning. Current state-of-the-art models are governed by rule-based systems which lack efficiency and robustness, and are sensitive to template design. In this paper, we propose YOLO-Stutter: a first end-to-end method that detects dysfluencies in a time-accurate manner. YOLO-Stutter takes imperfect speech-text alignment as input, followed by a spatial feature aggregator, and a temporal dependency extractor to perform region-wise boundary and class predictions. We also introduce two dysfluency corpus, VCTK-Stutter and VCTK-TTS, that simulate natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation. Our end-to-end method achieves state-of-the-art performance with a minimum number of trainable parameters for on both simulated data and real aphasia speech. Code and datasets are open-sourced at https://github.com/rorizzz/YOLO-Stutter

artificial intelligence, dysfluency, speech recognition, (19 more...)

arXiv.org Artificial Intelligence

2408.15297

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > California > Alameda County > Berkeley (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > China (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.70)

Add feedback

Stutter-Solver: End-to-end Multi-lingual Dysfluency Detection

Zhou, Xuanru, Cho, Cheol Jun, Sharma, Ayati, Morin, Brittany, Baquirin, David, Vonk, Jet, Ezzes, Zoe, Miller, Zachary, Tee, Boon Lead, Tempini, Maria Luisa Gorno, Lian, Jiachen, Anumanchipalli, Gopala

arXiv.org Artificial IntelligenceSep-15-2024

Current de-facto dysfluency modeling methods utilize template matching algorithms which are not generalizable to out-of-domain real-world dysfluencies across languages, and are not scalable with increasing amounts of training data. To handle these problems, we propose Stutter-Solver: an end-to-end framework that detects dysfluency with accurate type and time transcription, inspired by the YOLO object detection algorithm. Stutter-Solver can handle co-dysfluencies and is a natural multi-lingual dysfluency detector. To leverage scalability and boost performance, we also introduce three novel dysfluency corpora: VCTK-Pro, VCTK-Art, and AISHELL3-Pro, simulating natural spoken dysfluencies including repetition, block, missing, replacement, and prolongation through articulatory-encodec and TTS-based methods. Our approach achieves state-of-the-art performance on all available dysfluency corpora. Code and datasets are open-sourced at https://github.com/eureka235/Stutter-Solver

dysfluency, speech, stutter-solver, (17 more...)

arXiv.org Artificial Intelligence

2409.09621

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

SSDM: Scalable Speech Dysfluency Modeling

Lian, Jiachen, Zhou, Xuanru, Ezzes, Zoe, Vonk, Jet, Morin, Brittany, Baquirin, David, Mille, Zachary, Tempini, Maria Luisa Gorno, Anumanchipalli, Gopala

arXiv.org Artificial IntelligenceSep-14-2024

Speech dysfluency modeling is the core module for spoken language learning, and speech therapy. However, there are three challenges. First, current state-of-the-art solutions suffer from poor scalability. Second, there is a lack of a large-scale dysfluency corpus. Third, there is not an effective learning framework. In this paper, we propose \textit{SSDM: Scalable Speech Dysfluency Modeling}, which (1) adopts articulatory gestures as scalable forced alignment; (2) introduces connectionist subsequence aligner (CSA) to achieve dysfluency alignment; (3) introduces a large-scale simulated dysfluency corpus called Libri-Dys; and (4) develops an end-to-end system by leveraging the power of large language models (LLMs). We expect SSDM to serve as a standard in the area of dysfluency modeling. Demo is available at \url{https://eureka235.github.io}.

alignment, arxiv preprint arxiv, gestural score, (15 more...)

arXiv.org Artificial Intelligence

2408.16221

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Models for Dysfluency Detection in Stuttered Speech

Wagner, Dominik, Bayerl, Sebastian P., Baumann, Ilja, Riedhammer, Korbinian, Nöth, Elmar, Bocklet, Tobias

arXiv.org Artificial IntelligenceJun-16-2024

Accurately detecting dysfluencies in spoken language can help to improve the performance of automatic speech and language processing components and support the development of more inclusive speech and language technologies. Inspired by the recent trend towards the deployment of large language models (LLMs) as universal learners and processors of non-lexical inputs, such as audio and video, we approach the task of multi-label dysfluency detection as a language modeling problem. We present hypotheses candidates generated with an automatic speech recognition system and acoustic representations extracted from an audio encoder model to an LLM, and finetune the system to predict dysfluency labels on three datasets containing English and German stuttered speech. The experimental results show that our system effectively combines acoustic and lexical information and achieves competitive results on the multi-label stuttering detection task.

acoustic feature, dataset, representation, (14 more...)

arXiv.org Artificial Intelligence

2406.11025

Country: Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.14)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback