AITopics | intelligibility level

Collaborating Authors

intelligibility level

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Variational Low-Rank Adaptation for Personalized Impaired Speech Recognition

Pokel, Niclas, Moure, Pehuén, Boehringer, Roman, Liu, Shih-Chii, Gao, Yingqiang

arXiv.org Artificial IntelligenceSep-26-2025

Speech impairments resulting from congenital disorders, such as cerebral palsy, down syndrome, or apert syndrome, as well as acquired brain injuries due to stroke, traumatic accidents, or tumors, present major challenges to automatic speech recognition (ASR) systems. Despite recent advancements, state-of-the-art ASR models like Whisper still struggle with non-normative speech due to limited training data availability and high acoustic variability. Moreover, collecting and annotating non-normative speech is burdensome: speaking is effortful for many affected individuals, while laborious annotation often requires caregivers familiar with the speaker. This work introduces a novel ASR personalization method based on Bayesian Low-rank Adaptation for data-efficient fine-tuning. We validate our method on the English UA-Speech dataset and a newly collected German speech dataset, BF-Sprache, from a child with structural speech impairment. The dataset and approach are designed to reflect the challenges of low-resource settings that include individuals with speech impairments. Our method significantly improves ASR accuracy for impaired speech while maintaining data and annotation efficiency, offering a practical path toward inclusive ASR.

artificial intelligence, machine learning, speech, (15 more...)

arXiv.org Artificial Intelligence

2509.20397

Country:

Europe > Switzerland > Zürich > Zürich (0.15)
Asia > Middle East > Jordan (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Enhancing Dysarthric Speech Recognition for Unseen Speakers via Prototype-Based Adaptation

Wang, Shiyao, Zhao, Shiwan, Zhou, Jiaming, Kong, Aobo, Qin, Yong

arXiv.org Artificial IntelligenceJul-25-2024

Dysarthric speech recognition (DSR) presents a formidable challenge due to inherent inter-speaker variability, leading to severe performance degradation when applying DSR models to new dysarthric speakers. Traditional speaker adaptation methodologies typically involve fine-tuning models for each speaker, but this strategy is cost-prohibitive and inconvenient for disabled users, requiring substantial data collection. To address this issue, we introduce a prototype-based approach that markedly improves DSR performance for unseen dysarthric speakers without additional fine-tuning. Our method employs a feature extractor trained with HuBERT to produce per-word prototypes that encapsulate the characteristics of previously unseen speakers. These prototypes serve as the basis for classification. Additionally, we incorporate supervised contrastive learning to refine feature extraction. By enhancing representation quality, we further improve DSR performance, enabling effective personalized DSR. We release our code at https://github.com/NKU-HLT/PB-DSR.

dysarthric speaker, speech, unseen dysarthric speaker, (14 more...)

arXiv.org Artificial Intelligence

2407.18461

Country: Asia > China (0.04)

Genre: Research Report (0.85)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

On combining acoustic and modulation spectrograms in an attention LSTM-based system for speech intelligibility level classification

Gallardo-Antolín, Ascensión, Montero, Juan M.

arXiv.org Artificial IntelligenceFeb-5-2024

Speech intelligibility can be affected by multiple factors, such as noisy environments, channel distortions or physiological issues. In this work, we deal with the problem of automatic prediction of the speech intelligibility level in this latter case. Starting from our previous work, a non-intrusive system based on LSTM networks with attention mechanism designed for this task, we present two main contributions. In the first one, it is proposed the use of per-frame modulation spectrograms as input features, instead of compact representations derived from them that discard important temporal information. In the second one, two different strategies for the combination of per-frame acoustic log-mel and modulation spectrograms into the LSTM framework are explored: at decision level or late fusion and at utterance level or Weighted-Pooling (WP) fusion. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. On the one hand, results show that attentional LSTM networks are able to adequately modeling the modulation spectrograms sequences producing similar classification rates as in the case of log-mel spectrograms. On the other hand, both combination strategies, late and WP fusion, outperform the single-feature systems, suggesting that per-frame log-mel and modulation spectrograms carry complementary information for the task of speech intelligibility prediction, than can be effectively exploited by the LSTM-based architectures, being the system with the WP fusion strategy and Attention-Pooling the one that achieves best results.

log-mel and modulation spectrogram, modulation spectrogram, spectrogram, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.neucom.2021.05.065

2402.02865

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > South Korea (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

An Attention Long Short-Term Memory based system for automatic classification of speech intelligibility

Fernández-Díaz, Miguel, Gallardo-Antolín, Ascensión

arXiv.org Artificial IntelligenceFeb-5-2024

Speech intelligibility can be degraded due to multiple factors, such as noisy environments, technical difficulties or biological conditions. This work is focused on the development of an automatic non-intrusive system for predicting the speech intelligibility level in this latter case. The main contribution of our research on this topic is the use of Long Short-Term Memory (LSTM) networks with log-mel spectrograms as input features for this purpose. In addition, this LSTM-based system is further enhanced by the incorporation of a simple attention mechanism that is able to determine the more relevant frames to this task. The proposed models are evaluated with the UA-Speech database that contains dysarthric speech with different degrees of severity. Results show that the attention LSTM architecture outperforms both, a reference Support Vector Machine (SVM)-based system with hand-crafted features and a LSTM-based system with Mean-Pooling.

classification, intelligibility, intelligibility level, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.engappai.2020.103976

2402.0285

Country:

Europe > Spain > Galicia > Madrid (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > South Korea (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback