stutter
Clinical Annotations for Automatic Stuttering Severity Assessment
Valente, Ana Rita, Marew, Rufael, Toyin, Hawau Olamide, Al-Ali, Hamdan, Bohnen, Anelise, Becerra, Inma, Soares, Elsa Marta, Leal, Goncalo, Aldarmaki, Hanan
Stuttering is a complex disorder that requires specialized expertise for effective assessment and treatment. This paper presents an effort to enhance the FluencyBank dataset with a new stuttering annotation scheme based on established clinical standards. To achieve high-quality annotations, we hired expert clinicians to label the data, ensuring that the resulting annotations mirror real-world clinical expertise. The annotations are multi-modal, incorporating audiovisual features for the detection and classification of stuttering moments, secondary behaviors, and tension scores. In addition to individual annotations, we additionally provide a test set with highly reliable annotations based on expert consensus for assessing individual annotators and machine learning models. Our experiments and analysis illustrate the complexity of this task that necessitates extensive clinical expertise for valid training and evaluation of stuttering assessment models.
- Europe > Portugal > Aveiro > Aveiro (0.04)
- South America > Brazil (0.04)
- Europe > United Kingdom > England > Leicestershire > Leicester (0.04)
- Asia > Middle East > UAE (0.04)
Boli: A dataset for understanding stuttering experience and analyzing stuttered speech
Batra, Ashita, narang, Mannas, Sharma, Neeraj Kumar, Das, Pradip K
There is a growing need for diverse, high-quality stuttered speech data, particularly in the context of Indian languages. This paper introduces Project Boli, a multi-lingual stuttered speech dataset designed to advance scientific understanding and technology development for individuals who stutter, particularly in India. The dataset constitutes (a) anonymized metadata (gender, age, country, mother tongue) and responses to a questionnaire about how stuttering affects their daily lives, (b) captures both read speech (using the Rainbow Passage) and spontaneous speech (through image description tasks) for each participant and (c) includes detailed annotations of five stutter types: blocks, prolongations, interjections, sound repetitions and word repetitions. We present a comprehensive analysis of the dataset, including the data collection procedure, experience summarization of people who stutter, severity assessment of stuttering events and technical validation of the collected data. The dataset is released as an open access to further speech technology development.
- Research Report (0.40)
- Questionnaire & Opinion Survey (0.37)
Inclusive ASR for Disfluent Speech: Cascaded Large-Scale Self-Supervised Learning with Targeted Fine-Tuning and Data Augmentation
Mujtaba, Dena, Mahapatra, Nihar R., Arney, Megan, Yaruss, J. Scott, Herring, Caryn, Bin, Jia
Automatic speech recognition (ASR) systems often falter while processing stuttering-related disfluencies -- such as involuntary blocks and word repetitions -- yielding inaccurate transcripts. A critical barrier to progress is the scarcity of large, annotated disfluent speech datasets. Therefore, we present an inclusive ASR design approach, leveraging large-scale self-supervised learning on standard speech followed by targeted fine-tuning and data augmentation on a smaller, curated dataset of disfluent speech. Our data augmentation technique enriches training datasets with various disfluencies, enhancing ASR processing of these speech patterns. Results show that fine-tuning wav2vec 2.0 with even a relatively small, labeled dataset, alongside data augmentation, can significantly reduce word error rates for disfluent speech. Our approach not only advances ASR inclusivity for people who stutter, but also paves the way for ASRs that can accommodate wider speech variations.
- North America > United States > Minnesota (0.04)
- North America > United States > Michigan (0.04)
Lost in Transcription: Identifying and Quantifying the Accuracy Biases of Automatic Speech Recognition Systems Against Disfluent Speech
Mujtaba, Dena, Mahapatra, Nihar R., Arney, Megan, Yaruss, J. Scott, Gerlach-Houck, Hope, Herring, Caryn, Bin, Jia
Automatic speech recognition (ASR) systems, increasingly prevalent in education, healthcare, employment, and mobile technology, face significant challenges in inclusivity, particularly for the 80 million-strong global community of people who stutter. These systems often fail to accurately interpret speech patterns deviating from typical fluency, leading to critical usability issues and misinterpretations. This study evaluates six leading ASRs, analyzing their performance on both a real-world dataset of speech samples from individuals who stutter and a synthetic dataset derived from the widely-used LibriSpeech benchmark. The synthetic dataset, uniquely designed to incorporate various stuttering events, enables an in-depth analysis of each ASR's handling of disfluent speech. Our comprehensive assessment includes metrics such as word error rate (WER), character error rate (CER), and semantic accuracy of the transcripts. The results reveal a consistent and statistically significant accuracy bias across all ASRs against disfluent speech, manifesting in significant syntactical and semantic inaccuracies in transcriptions. These findings highlight a critical gap in current ASR technologies, underscoring the need for effective bias mitigation strategies. Addressing this bias is imperative not only to improve the technology's usability for people who stutter but also to ensure their equitable and inclusive participation in the rapidly evolving digital landscape.
- North America > United States > Michigan (0.04)
- North America > United States > Minnesota (0.04)
- Information Technology (0.94)
- Law (0.93)
- Health & Medicine (0.88)
Stutter-TTS: Controlled Synthesis and Improved Recognition of Stuttered Speech
Zhang, Xin, Vallés-Pérez, Iván, Stolcke, Andreas, Yu, Chengzhu, Droppo, Jasha, Shonibare, Olabanji, Barra-Chicote, Roberto, Ravichandran, Venkatesh
Stuttering is a speech disorder where the natural flow of speech is interrupted by blocks, repetitions or prolongations of syllables, words and phrases. The majority of existing automatic speech recognition (ASR) interfaces perform poorly on utterances with stutter, mainly due to lack of matched training data. Synthesis of speech with stutter thus presents an opportunity to improve ASR for this type of speech. We describe Stutter-TTS, an end-to-end neural text-to-speech model capable of synthesizing diverse types of stuttering utterances. We develop a simple, yet effective prosody-control strategy whereby additional tokens are introduced into source text during training to represent specific stuttering characteristics. By choosing the position of the stutter tokens, Stutter-TTS allows word-level control of where stuttering occurs in the synthesized utterance. We are able to synthesize stutter events with high accuracy (F1-scores between 0.63 and 0.84, depending on stutter type). By fine-tuning an ASR model on synthetic stuttered speech we are able to reduce word error by 5.7% relative on stuttered utterances, with only minor (< 0.2% relative) degradation for fluent utterances.
- Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Fluent: An AI Augmented Writing Tool for People who Stutter - Technology Org
Stuttering is a disorder that negatively affects personal and professional life. One of the factors which may impact the likelihood of stuttering is phonological patterns. Some words are more prone to cause stuttering than others, and people who stutter (PWS) can identify which words they might struggle with and then think of a way to manage. Recent advancements in AI, such as phonetic embeddings, can help to simplify these processes. Therefore, a recent paper presents a novel machine-in-the-loop writing tool for assisting PWS with writing scripts, which minimize the number of stuttering events.
Fluent: An AI Augmented Writing Tool for People who Stutter
Stuttering is a speech disorder which impacts the personal and professional lives of millions of people worldwide. To save themselves from stigma and discrimination, people who stutter (PWS) may adopt different strategies to conceal their stuttering. One of the common strategies is word substitution where an individual avoids saying a word they might stutter on and use an alternative instead. This process itself can cause stress and add more burden. In this work, we present Fluent, an AI augmented writing tool which assists PWS in writing scripts which they can speak more fluently. Fluent embodies a novel active learning based method of identifying words an individual might struggle pronouncing. Such words are highlighted in the interface. On hovering over any such word, Fluent presents a set of alternative words which have similar meaning but are easier to speak. The user is free to accept or ignore these suggestions. Based on such user interaction (feedback), Fluent continuously evolves its classifier to better suit the personalized needs of each user. We evaluated our tool by measuring its ability to identify difficult words for 10 simulated users. We found that our tool can identify difficult words with a mean accuracy of over 80% in under 20 interactions and it keeps improving with more feedback. Our tool can be beneficial for certain important life situations like giving a talk, presentation, etc. The source code for this tool has been made publicly accessible at github.com/bhavyaghai/Fluent.
- North America > United States > New York > Suffolk County > Stony Brook (0.05)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (3 more...)
- Information Technology > Human Computer Interaction (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Communications (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)
For people who stutter, the convenience of voice assistant technology remains out of reach
Do you ever feel as if your voice assistants – whether Siri, Alexa, or Google – don't understand you? You might repeat your question a little slower, a little louder, but eventually you'll get the information you were asking for read back to you in the pleasing but lifeless tones of your voice-activated assistant. That's the question facing many of the 3 million people in the United States who stutter, plus the thousands of others who have impaired speech not limited to stuttering, and many are feeling left out. "When this stuff first started coming out, I was all over it," said Jacquelyn Joyce Revere, a screenwriter from Los Angeles who stutters. "In LA, I need GPS all the time, so this seemed like a more convenient way to live the life I want to live."
- North America > United States > California > Los Angeles County > Los Angeles (0.25)
- North America > United States > Washington > King County > Kent (0.05)
- North America > United States > California > Sacramento County > Elk Grove (0.05)
Detecting Multiple Speech Disfluencies using a Deep Residual Network with Bidirectional Long Short-Term Memory
Kourkounakis, Tedd, Hajavi, Amirhossein, Etemad, Ali
ABSTRACT Stuttering is a speech impediment affecting tens of millions of people on an everyday basis. Even with its commonality, there is minimal data and research on the identification and classification of stuttered speech. This paper tackles the problem of detection and classification of different forms of stutter. As opposed to most existing works that identify stutters with language models, our work proposes a model that relies solely on acoustic features, allowing for identification of several variations of stutter disfluencies without the need for speech recognition. Our model uses a deep residual network and bidirectional long short-term memory layers to classify different types of stutters and achieves an average miss rate of 10.03%, outperforming the state-of-the-art by almost 27%.
People With Speech Disabilities Are Being Left Out of the Voice-Assistant Revolution
When Whitney Bailey bought an Amazon Echo, she wanted to use the hands-free calling feature in case she fell and couldn't reach her phone. She hoped that it would offer her family some peace of mind and help make life a little easier. In some ways, she says, it does. But because she has cerebral palsy, her voice is strained when she talks, and she struggles to get Alexa to understand her. To make matters worse, having to repeat commands strains her voice even more.
- North America > Canada > Ontario > Toronto (0.15)
- North America > United States (0.05)