AITopics | speech evaluation

Collaborating Authors

speech evaluation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Automated evaluation of children's speech fluency for low-resource languages

Zhang, Bowen, Latiff, Nur Afiqah Abdul, Kan, Justin, Tong, Rong, Soh, Donny, Miao, Xiaoxiao, McLoughlin, Ian

arXiv.org Artificial IntelligenceOct-24-2025

Assessment of children's speaking fluency in education is well researched for majority languages, but remains highly challenging for low resource languages. This paper propose s a system to automatically assess fluency by combining a fine-tuned multilingual ASR model, an objective metrics extract ion stage, and a generative pre-trained transformer (GPT) netw ork. The objective metrics include phonetic and word error rates, speech rate, and speech-pause duration ratio. These are interpreted by a GPT -based classifier guided by a small set of human-evaluated ground truth examples, to score fluency. We evaluate the proposed system on a dataset of children's spee ch in two low-resource languages, Tamil and Malay and compare the classification performance against Random Forest and XG - Boost, as well as using ChatGPT -4o to predict fluency directl y from speech input. Results demonstrate that the proposed ap - proach achieves significantly higher accuracy than multimo dal GPT or other methods.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-1358

2505.19671

Country: Asia (0.30)

Genre: Research Report > New Finding (0.48)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beyond Survival: Evaluating LLMs in Social Deduction Games with Human-Aligned Strategies

Song, Zirui, Huang, Yuan, Liu, Junchang, Luo, Haozhe, Wang, Chenxi, Gao, Lang, Xu, Zixiang, Han, Mingfei, Chang, Xiaojun, Chen, Xiuying

arXiv.org Artificial IntelligenceOct-14-2025

Social deduction games like Werewolf combine language, reasoning, and strategy, providing a testbed for studying natural language and social intelligence. However, most studies reduce the game to LLM-based self-play, yielding templated utterances and anecdotal cases that overlook the richness of social gameplay. Evaluation further relies on coarse metrics such as survival time or subjective scoring due to the lack of quality reference data. To address these gaps, we curate a high-quality, human-verified multimodal Werewolf dataset containing over 100 hours of video, 32.4M utterance tokens, and 15 rule variants. Based on this dataset, we propose a novel strategy-alignment evaluation that leverages the winning faction's strategies as ground truth in two stages: 1) Speech evaluation, formulated as multiple-choice-style tasks that assess whether the model can adopt appropriate stances across five dimensions of social ability; and 2) Decision evaluation, which assesses the model's voting choices and opponent-role inferences. This framework enables a fine-grained evaluation of models' linguistic and reasoning capabilities, while capturing their ability to generate strategically coherent gameplay. Our experiments show that state-of-the-art LLMs show diverse performance, with roughly half remain below 0.50, revealing clear gaps in deception and counterfactual reasoning. We hope our dataset further inspires research on language, reasoning, and strategy in multi-agent interaction.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.11389

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Games (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Semi-supervised Learning For Robust Speech Evaluation

Zhang, Huayun, Wong, Jeremy H. M., Lin, Geyu, Chen, Nancy F.

arXiv.org Artificial IntelligenceSep-22-2024

Speech evaluation measures a learners oral proficiency using automatic models. Corpora for training such models often pose sparsity challenges given that there often is limited scored data from teachers, in addition to the score distribution across proficiency levels being often imbalanced among student cohorts. Automatic scoring is thus not robust when faced with under-represented samples or out-of-distribution samples, which inevitably exist in real-world deployment scenarios. This paper proposes to address such challenges by exploiting semi-supervised pre-training and objective regularization to approximate subjective evaluation criteria. In particular, normalized mutual information is used to quantify the speech characteristics from the learner and the reference. An anchor model is trained using pseudo labels to predict the correctness of pronunciation. An interpolated loss function is proposed to minimize not only the prediction error with respect to ground-truth scores but also the divergence between two probability distributions estimated by the speech evaluation model and the anchor model. Compared to other state-of-the-art methods on a public data-set, this approach not only achieves high performance while evaluating the entire test-set as a whole, but also brings the most evenly distributed prediction error across distinct proficiency levels. Furthermore, empirical results show the model accuracy on out-of-distribution data also compares favorably with competitive baselines.

anchor model, evaluation, speech evaluation, (15 more...)

arXiv.org Artificial Intelligence

2409.14666

Country:

Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)

Add feedback