AITopics | intelligibility score

Collaborating Authors

intelligibility score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Leveraging Multiple Speech Enhancers for Non-Intrusive Intelligibility Prediction for Hearing-Impaired Listeners

Cao, Boxuan, Li, Linkai, Yu, Hanlin, Mo, Changgeng, Zhou, Haoshuai, Wang, Shan Xiang

arXiv.org Artificial IntelligenceSep-23-2025

Speech intelligibility evaluation for hearing-impaired (HI) listeners is essential for assessing hearing aid performance, traditionally relying on listening tests or intrusive methods like HASPI. However, these methods require clean reference signals, which are often unavailable in real-world conditions, creating a gap between lab-based and real-world assessments. To address this, we propose a non-intrusive intelligibility prediction framework that leverages speech enhancers to provide a parallel enhanced-signal pathway, enabling robust predictions without reference signals. We evaluate three state-of-the-art enhancers and demonstrate that prediction performance depends on the choice of enhancer, with ensembles of strong enhancers yielding the best results. To improve cross-dataset generalization, we introduce a 2-clips augmentation strategy that enhances listener-specific variability, boosting robustness on unseen datasets. Our approach consistently outperforms the non-intrusive baseline, CPC2 Champion across multiple datasets, highlighting the potential of enhancer-guided non-intrusive intelligibility prediction for real-world applications.

artificial intelligence, intelligibility prediction, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2509.16979

Country: North America (0.28)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Therapeutic Area (0.37)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.94)

Add feedback

No Audiogram: Leveraging Existing Scores for Personalized Speech Intelligibility Prediction

Zhou, Haoshuai, Mo, Changgeng, Cao, Boxuan, Li, Linkai, Wang, Shan Xiang

arXiv.org Artificial IntelligenceJun-4-2025

Personalized speech intelligibility prediction is challenging. Previous approaches have mainly relied on audiograms, which are inherently limited in accuracy as they only capture a listener's hearing threshold for pure tones. Rather than incorporating additional listener features, we propose a novel approach that leverages an individual's existing intelligibility data to predict their performance on new audio. We introduce the Support Sample-Based Intelligibility Prediction Network (SSIPNet), a deep learning model that leverages speech foundation models to build a high-dimensional representation of a listener's speech recognition ability from multiple support (audio, score) pairs, enabling accurate predictions for unseen audio. Results on the Clarity Prediction Challenge dataset show that, even with a small number of support (audio, score) pairs, our method outperforms audiogram-based predictions. Our work presents a new paradigm for personalized speech intelligibility prediction.

artificial intelligence, intelligibility score, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2506.02039

Country:

North America > United States > Florida > Hillsborough County > University (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Therapeutic Area (0.97)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Crowdsourced Multilingual Speech Intelligibility Testing

Lechler, Laura, Wojcicki, Kamil

arXiv.org Artificial IntelligenceMar-21-2024

With the advent of generative audio features, there is an increasing need for rapid evaluation of their impact on speech intelligibility. Beyond the existing laboratory measures, which are expensive and do not scale well, there has been comparatively little work on crowdsourced assessment of intelligibility. Standards and recommendations are yet to be defined, and publicly available multilingual test materials are lacking. In response to this challenge, we propose an approach for a crowdsourced intelligibility assessment. We detail the test design, the collection and public release of the multilingual speech data, and the results of our early experiments.

assessment, intelligibility, participant, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP48485.2024.10447869

2403.14817

Country:

Europe > Greece (0.04)
North America > United States > Rhode Island (0.04)
North America > United States > Massachusetts > Middlesex County > Sudbury (0.04)
(7 more...)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Social Media > Crowdsourcing (1.00)

Add feedback

BASPRO: a balanced script producer for speech corpus collection based on the genetic algorithm

Chen, Yu-Wen, Wang, Hsin-Min, Tsao, Yu

arXiv.org Artificial IntelligenceDec-10-2022

The performance of speech-processing models is heavily influenced by the speech corpus that is used for training and evaluation. In this study, we propose BAlanced Script PROducer (BASPRO) system, which can automatically construct a phonetically balanced and rich set of Chinese sentences for collecting Mandarin Chinese speech data. First, we used pretrained natural language processing systems to extract ten-character candidate sentences from a large corpus of Chinese news texts. Then, we applied a genetic algorithm-based method to select 20 phonetically balanced sentence sets, each containing 20 sentences, from the candidate sentences. Using BASPRO, we obtained a recording script called TMNews, which contains 400 ten-character sentences. TMNews covers 84% of the syllables used in the real world. Moreover, the syllable distribution has 0.96 cosine similarity to the real-world syllable distribution. We converted the script into a speech corpus using two text-to-speech systems. Using the designed speech corpus, we tested the performances of speech enhancement (SE) and automatic speech recognition (ASR), which are one of the most important regression- and classification-based speech processing tasks, respectively. The experimental results show that the SE and ASR models trained on the designed speech corpus outperform their counterparts trained on a randomly composed speech corpus.

evolutionary algorithm, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2301.0412

Country:

North America > United States > New York (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Add feedback

MTI-Net: A Multi-Target Speech Intelligibility Prediction Model

Zezario, Ryandhimas E., Fu, Szu-wei, Chen, Fei, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu

arXiv.org Artificial IntelligenceAug-30-2022

Recently, deep learning (DL)-based non-intrusive speech assessment models have attracted great attention. Many studies report that these DL-based models yield satisfactory assessment performance and good flexibility, but their performance in unseen environments remains a challenge. Furthermore, compared to quality scores, fewer studies elaborate deep learning models to estimate intelligibility scores. This study proposes a multi-task speech intelligibility prediction model, called MTI-Net, for simultaneously predicting human and machine intelligibility measures. Specifically, given a speech utterance, MTI-Net is designed to predict human subjective listening test results and word error rate (WER) scores. We also investigate several methods that can improve the prediction performance of MTI-Net. First, we compare different features (including low-level features and embeddings from self-supervised learning (SSL) models) and prediction targets of MTI-Net. Second, we explore the effect of transfer learning and multi-tasking learning on training MTI-Net. Finally, we examine the potential advantages of fine-tuning SSL embeddings. Experimental results demonstrate the effectiveness of using cross-domain features, multi-task learning, and fine-tuning SSL embeddings. Furthermore, it is confirmed that the intelligibility and WER scores predicted by MTI-Net are highly correlated with the ground-truth scores.

intelligibility, mti-net, ssl, (15 more...)

arXiv.org Artificial Intelligence

2204.0331

Country:

Asia > Taiwan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.72)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MBI-Net: A Non-Intrusive Multi-Branched Speech Intelligibility Prediction Model for Hearing Aids

Zezario, Ryandhimas E., Chen, Fei, Fuh, Chiou-Shann, Wang, Hsin-Min, Tsao, Yu

arXiv.org Artificial IntelligenceAug-30-2022

Improving the user's hearing ability to understand speech in noisy environments is critical to the development of hearing aid (HA) devices. For this, it is important to derive a metric that can fairly predict speech intelligibility for HA users. A straightforward approach is to conduct a subjective listening test and use the test results as an evaluation metric. However, conducting large-scale listening tests is time-consuming and expensive. Therefore, several evaluation metrics were derived as surrogates for subjective listening test results. In this study, we propose a multi-branched speech intelligibility prediction model (MBI-Net), for predicting the subjective intelligibility scores of HA users. MBI-Net consists of two branches of models, with each branch consisting of a hearing loss model, a cross-domain feature extraction module, and a speech intelligibility prediction model, to process speech signals from one channel. The outputs of the two branches are fused through a linear layer to obtain predicted speech intelligibility scores. Experimental results confirm the effectiveness of MBI-Net, which produces higher prediction scores than the baseline system in Track 1 and Track 2 on the Clarity Prediction Challenge 2022 dataset.

intelligibility, intelligibility prediction model, mbi-net, (13 more...)

arXiv.org Artificial Intelligence

2204.03305

Country:

Asia > Taiwan (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.69)

Industry:

Health & Medicine > Therapeutic Area > Otolaryngology (0.57)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.40)
Health & Medicine > Therapeutic Area > Immunology (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The Phonetic Footprint of Parkinson's Disease

Klumpp, Philipp, Arias-Vergara, Tomás, Vásquez-Correa, Juan Camilo, Pérez-Toro, Paula Andrea, Orozco-Arroyave, Juan Rafael, Batliner, Anton, Nöth, Elmar

arXiv.org Artificial IntelligenceDec-21-2021

As one of the most prevalent neurodegenerative disorders, Parkinson's disease (PD) has a significant impact on the fine motor skills of patients. The complex interplay of different articulators during speech production and realization of required muscle tension become increasingly difficult, thus leading to a dysarthric speech. Characteristic patterns such as vowel instability, slurred pronunciation and slow speech can often be observed in the affected individuals and were analyzed in previous studies to determine the presence and progression of PD. In this work, we used a phonetic recognizer trained exclusively on healthy speech data to investigate how PD affected the phonetic footprint of patients. We rediscovered numerous patterns that had been described in previous contributions although our system had never seen any pathological speech previously. Furthermore, we could show that intermediate activations from the neural network could serve as feature vectors encoding information related to the disease state of individuals. We were also able to directly correlate the expert-rated intelligibility of a speaker with the mean confidence of phonetic predictions. Our results support the assumption that pathological data is not necessarily required to train systems that are capable of analyzing PD speech.

parkinson, pd patient, probability, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csl.2021.101321

2112.11514

Country:

South America > Colombia > Antioquia Department > Medellín (0.04)
North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Automatic Speaker Independent Dysarthric Speech Intelligibility Assessment System

Tripathi, Ayush, Bhosale, Swapnil, Kopparapu, Sunil Kumar

arXiv.org Artificial IntelligenceMar-10-2021

Dysarthria is a condition which hampers the ability of an individual to control the muscles that play a major role in speech delivery. The loss of fine control over muscles that assist the movement of lips, vocal chords, tongue and diaphragm results in abnormal speech delivery. One can assess the severity level of dysarthria by analyzing the intelligibility of speech spoken by an individual. Continuous intelligibility assessment helps speech language pathologists not only study the impact of medication but also allows them to plan personalized therapy. It helps the clinicians immensely if the intelligibility assessment system is reliable, automatic, simple for (a) patients to undergo and (b) clinicians to interpret. Lack of availability of dysarthric data has resulted in development of speaker dependent automatic intelligibility assessment systems which requires patients to speak a large number of utterances. In this paper, we propose (a) a cost minimization procedure to select an optimal (small) number of utterances that need to be spoken by the dysarthric patient, (b) four different speaker independent intelligibility assessment systems which require the patient to speak a small number of words, and (c) the assessment score is close to the perceptual score that the Speech Language Pathologist (SLP) can relate to. The need for small number of utterances to be spoken by the patient and the score being relatable to the SLP benefits both the dysarthric patient and the clinician from usability perspective.

assessment, intelligibility, speech, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.csl.2021.101213

2103.06157

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > San Diego County > Oceanside (0.04)
North America > Canada > Quebec > Montreal (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Education > Assessment & Standards > Assessment Methods (1.00)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.46)
Health & Medicine > Therapeutic Area > Oncology > Head & Neck Cancer (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback