AITopics | speech impairment

Collaborating Authors

speech impairment

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Cookbook for Community-driven Data Collection of Impaired Speech in LowResource Languages

Salihs, Sumaya Ahmed, Wiafe, Isaac, Abdulai, Jamal-Deen, Atsakpo, Elikem Doe, Ayoka, Gifty, Cave, Richard, Ekpezu, Akon Obu, Holloway, Catherine, Tomanek, Katrin, Winful, Fiifi Baffoe Payin

arXiv.org Artificial IntelligenceJul-4-2025

This study presents an approach for collecting speech samples to build Automatic Speech Recognition (ASR) models for impaired speech, particularly, low-resource languages. It aims to democratize ASR technology and data collection by developing a "cookbook" of best practices and training for community-driven data collection and ASR model building. As a proof-of-concept, this study curated the first open-source dataset of impaired speech in Akan: a widely spoken indigenous language in Ghana. The study involved participants from diverse backgrounds with speech impairments. The resulting dataset, along with the cookbook and open-source tools, are publicly available to enable researchers and practitioners to create inclusive ASR technologies tailored to the unique needs of speech impaired individuals. In addition, this study presents the initial results of fine-tuning open-source ASR models to better recognize impaired speech in Akan.

artificial intelligence, participant, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

2507.02428

Country:

Africa > Ghana (0.69)
Europe > United Kingdom (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Adapting Foundation Speech Recognition Models to Impaired Speech: A Semantic Re-chaining Approach for Personalization of German Speech

Pokel, Niclas, Moure, Pehuén, Boehringer, Roman, Gao, Yingqiang

arXiv.org Artificial IntelligenceJun-30-2025

Speech impairments caused by conditions such as cerebral palsy or genetic disorders pose significant challenges for automatic speech recognition (ASR) systems. Despite recent advances, ASR models like Whisper struggle with non-normative speech due to limited training data and the difficulty of collecting and annotating non-normative speech samples. In this work, we propose a practical and lightweight pipeline to personalize ASR models, formalizing the selection of words and enriching a small, speech-impaired dataset with semantic coherence. Applied to data from a child with a structural speech impairment, our approach shows promising improvements in transcription quality, demonstrating the potential to reduce communication barriers for individuals with atypical speech patterns.

artificial intelligence, machine learning, utterance, (14 more...)

arXiv.org Artificial Intelligence

2506.21622

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.66)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Inclusivity of AI Speech in Healthcare: A Decade Look Back

Larasati, Retno

arXiv.org Artificial IntelligenceMay-19-2025

The integration of AI speech recognition technologies into healthcare has the potential to revolutionize clinical workflows and patient-provider communication. However, this study reveals significant gaps in inclusivity, with datasets and research disproportionately favouring high-resource languages, standardized accents, and narrow demographic groups. These biases risk perpetuating healthcare disparities, as AI systems may misinterpret speech from marginalized groups. This paper highlights the urgent need for inclusive dataset design, bias mitigation research, and policy frameworks to ensure equitable access to AI speech technologies in healthcare.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.10596

Country:

Asia (0.29)
Europe (0.28)
Oceania (0.28)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Health Care Access (0.48)
Health & Medicine > Health Care Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.93)

Add feedback

Careless Whisper: Speech-to-Text Hallucination Harms

Koenecke, Allison, Choi, Anna Seo Gyeong, Mei, Katelyn, Schellmann, Hilke, Sloane, Mona

arXiv.org Artificial IntelligenceFeb-12-2024

Use of such speech-to-text APIs is increasingly prevalent in high-stakes downstream applications, ranging from surveillance of incarcerated people [22] to medical care [14]. While such speech-to-text APIs can generate written transcriptions more quickly than human transcribers, there are grave concerns regarding bias in automated transcription accuracy, e.g., underperformance for African American English speakers [11] and speakers with speech impairments such as dysphonia [12]. These biases within APIs can perpetuate disparities when real-world decisions are made based on automated speech-to-text transcriptions--from police making carceral judgements to doctors making treatment decisions. OpenAI released its Whisper speech-to-text API in September 2022 with experiments showing better speech transcription accuracy relative to market competitors [19]. We evaluate Whisper's transcription performance on the axis of "hallucinations," defined as undesirable generated text "that is nonsensical, or unfaithful to the provided source input" [10]. Our approach compares the ground truth of a speech snippet with the outputted transcription; we find hallucinations in roughly 1% of transcriptions generated in mid-2023, wherein Whisper hallucinates entire made-up sentences when no one is speaking in the input audio files. While hallucinations have been increasingly studied in the context of text generated by ChatGPT (a language model also made by OpenAI) [8, 10], hallucinations have only been considered in speech-to-text models as a means to study error prediction [21], and not as a fundamental concern in and of itself. In this paper, we provide experimental quantification of Whisper hallucinations, finding that nearly 40% of the hallucinations are harmful or concerning in some way (as opposed to innocuous and random).

audio segment, hallucination, transcription, (15 more...)

arXiv.org Artificial Intelligence

2402.08021

Country:

North America > United States > Virginia (0.04)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology (1.00)
Health & Medicine > Therapeutic Area (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.55)

Add feedback

An analysis of degenerating speech due to progressive dysarthria on ASR performance

Tomanek, Katrin, Seaver, Katie, Jiang, Pan-Pan, Cave, Richard, Harrel, Lauren, Green, Jordan R.

arXiv.org Artificial IntelligenceOct-31-2022

Although personalized automatic speech recognition (ASR) models have recently been designed to recognize even severely impaired speech, model performance may degrade over time for persons with degenerating speech. The aims of this study were to (1) analyze the change of performance of ASR over time in individuals with degrading speech, and (2) explore mitigation strategies to optimize recognition throughout disease progression. Speech was recorded by four individuals with degrading speech due to amyotrophic lateral sclerosis (ALS). Word error rates (WER) across recording sessions were computed for three ASR models: Unadapted Speaker Independent (U-SI), Adapted Speaker Independent (A-SI), and Adapted Speaker Dependent (A-SD or personalized). The performance of all three models degraded significantly over time as speech became more impaired, but the performance of the A-SD model improved markedly when it was updated with recordings from the severe stages of speech progression. Recording additional utterances early in the disease before speech degraded significantly did not improve the performance of A-SD models. Overall, our findings emphasize the importance of continuous recording (and model retraining) when providing personalized models for individuals with progressive speech impairments.

artificial intelligence, machine learning, speech, (14 more...)

arXiv.org Artificial Intelligence

2211.00089

Country:

Europe > United Kingdom (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Experimental Study (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology > Amyotrophic Lateral Sclerosis (ALS) (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Assessing ASR Model Quality on Disordered Speech using BERTScore

Tobin, Jimmy, Li, Qisheng, Venugopalan, Subhashini, Seaver, Katie, Cave, Richard, Tomanek, Katrin

arXiv.org Artificial IntelligenceSep-21-2022

Word Error Rate (WER) is the primary metric used to assess automatic speech recognition (ASR) model quality. It has been shown that ASR models tend to have much higher WER on speakers with speech impairments than typical English speakers. It is hard to determine if models can be be useful at such high error rates. This study investigates the use of BERTScore, an evaluation metric for text generation, to provide a more informative measure of ASR model quality and usefulness. Both BERTScore and WER were compared to prediction errors manually annotated by Speech Language Pathologists for error type and assessment. BERTScore was found to be more correlated with human assessment of error type and assessment. BERTScore was specifically more robust to orthographic changes (contraction and normalization errors) where meaning was preserved. Furthermore, BERTScore was a better fit of error assessment than WER, as measured using an ordinal logistic regression and the Akaike's Information Criterion (AIC). Overall, our findings suggest that BERTScore can complement WER when assessing ASR model performance from a practical perspective, especially for accessibility applications where models are useful even at lower accuracy than for typical speech.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2209.10591

Country:

North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.56)

Add feedback

Podcast: How AI is giving a woman back her voice

MIT Technology ReviewDec-8-2021, 05:50:00 GMT

Voice technology is one of the biggest trends in the healthcare space. We look at how it might help care providers and patients, from a woman who is losing her speech, to documenting healthcare records for doctors. But how do you teach AI to learn to communicate more like a human, and will it lead to more efficient machines? This episode was reported and produced by Anthony Green with help from Jennifer Strong and Emma Cillekens. It was edited by Michael Reilly. Our mix engineer is Garret Lang and our theme music is by Jacob Gorski. Jennifer: Healthcare looks a little different than it did not so long ago…when your doctor likely wrote down details about your condition on a piece of paper...

andrea peet, hod lipson, ken harper, (12 more...)

MIT Technology Review

Country:

North America > United States > New York (0.04)
North America > United States > California (0.04)

Industry: Health & Medicine > Health Care Technology (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.95)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

Watch: Google unveils new AI app to help people with speech impairments

#artificialintelligenceNov-11-2021, 10:11:07 GMT

Google is seeking volunteers for a new beta app called Project Relate, which aims to provide people with speech impairments with a voice assistant that can transcribe their speech in real time as well synthesize what they are saying. The app is part of Project Euphoria, which is a wider endeavor started in 2019 that's aimed at collecting data to be used for improving Google's AI algorithms when it comes to handling speech from people who "have difficulty being understood by others," such as those affected by neurological conditions. As for the Relate app, it has three key features. The Listen feature will transcribe a user's speech in real time, allowing them to copy and paste into other apps or show to other people. The Repeat feature will restate what the user is saying in a "clear synthesized voice," which Google hopes will aid face-to-face conversations and help when people with speech impairments want to speak a command to a smart home device.

app, google, speech impairment, (5 more...)

#artificialintelligence

Country:

Oceania > New Zealand (0.06)
Oceania > Australia (0.06)
North America > United States (0.06)
North America > Canada (0.06)

Industry:

Information Technology (0.76)
Health & Medicine > Therapeutic Area > Neurology (0.38)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.38)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.38)

Add feedback

Google made an app to ease communication for people with speech impairments

EngadgetNov-9-2021, 20:30:16 GMT

For too long, people with speech impairments have struggled to be understood not only by other people, but also by voice-based technology. Though some companies have started to make their products work better for people with atypical speech, the most prevalent services still don't hear them well. Google announced today that it's made a new Android app called Project Relate that could help people with speech impairments communicate more easily with others and the Assistant. It's looking for beta testers to test and improve the app starting today. Like product manager for Google Research Julie Cattiau said in a video, "standard speech recognition doesn't always work as well for people with atypical speech because the algorithms have not been trained on samples of their speech."

atypical speech, ease communication, speech impairment, (2 more...)

Engadget

Genre: Press Release (0.61)

Technology:

Information Technology > Communications > Mobile (0.64)
Information Technology > Artificial Intelligence > Speech (0.45)

Add feedback

Text to Speech Technology: How Voice Computing is Building a More Accessible World

#artificialintelligenceJun-29-2020, 22:16:22 GMT

In a world where new technology emerges at exponential rates, and our daily lives are increasingly mediated by speakers and sound waves, text to speech technology is the latest force evolving the way we communicate. Text to speech technology refers to a field of computer science that enables the conversion of language text into audible speech. Also known as voice computing, text to speech (TTS) often involves building a database of recorded human speech to train a computer to produce sound waves that resemble the natural sound of a human speaking. This process is called speech synthesis. The technology is trailblazing and major breakthroughs in the field occur regularly.

artificial intelligence, optical character recognition, speech technology, (10 more...)

#artificialintelligence

Country: North America > United States (0.05)

Industry:

Health & Medicine (0.49)
Leisure & Entertainment (0.48)
Media (0.47)

Technology:

Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (1.00)

Add feedback