AITopics | speech pattern

Collaborating Authors

speech pattern

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Efficient Hate Speech Detection: A Three-Layer LoRA-Tuned BERTweet Framework

El-Bahnasawi, Mahmoud

arXiv.org Artificial IntelligenceNov-11-2025

This paper addresses the critical challenge of developing computationally efficient hate speech detection systems that maintain competitive performance while being practical for real-time deployment. We propose a novel three-layer framework that combines rule-based pre-filtering with a parameter-efficient LoRA-tuned BERTweet model and continuous learning capabilities. Our approach achieves 0.85 macro F1 score - representing 94% of the performance of state-of-the-art large language models like SafePhi (Phi-4 based) while using a base model that is 100x smaller (134M vs 14B parameters). Compared to traditional BERT-based approaches with similar computational requirements, our method demonstrates superior performance through strategic dataset unification and optimized fine-tuning. The system requires only 1.87M trainable parameters (1.37% of full fine-tuning) and trains in approximately 2 hours on a single T4 GPU, making robust hate speech detection accessible in resource-constrained environments while maintaining competitive accuracy for real-world deployment.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2511.06051

Country: North America > Mexico > Mexico City (0.14)

Genre: Research Report (0.64)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.37)

Add feedback

Cross-Lingual Multi-Granularity Framework for Interpretable Parkinson's Disease Diagnosis from Speech

Tougui, Ilias, Zakroum, Mehdi, Ghogho, Mounir

arXiv.org Artificial IntelligenceOct-7-2025

Parkinson's Disease (PD) affects over 10 million people worldwide, with speech impairments in up to 89% of patients. Current speech-based detection systems analyze entire utterances, potentially overlooking the diagnostic value of specific phonetic elements. We developed a granularity-aware approach for multilingual PD detection using an automated pipeline that extracts time-aligned phonemes, syllables, and words from recordings. Using Italian, Spanish, and English datasets, we implemented a bidirectional LSTM with multi-head attention to compare diagnostic performance across the different granularity levels. Phoneme-level analysis achieved superior performance with AUROC of 93.78% +- 2.34% and accuracy of 92.17% +- 2.43%. This demonstrates enhanced diagnostic capability for cross-linguistic PD detection. Importantly, attention analysis revealed that the most informative speech features align with those used in established clinical protocols: sustained vowels (/a/, /e/, /o/, /i/) at phoneme level, diadochokinetic syllables (/ta/, /pa/, /la/, /ka/) at syllable level, and /pataka/ sequences at word level. Source code will be available at https://github.com/jetliqs/clearpd.

artificial intelligence, machine learning, syllable, (18 more...)

arXiv.org Artificial Intelligence

2510.03758

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)

Add feedback

Idiosyncratic Versus Normative Modeling of Atypical Speech Recognition: Dysarthric Case Studies

Raja, Vishnu, Ganesan, Adithya V, Syamkumar, Anand, Banerjee, Ritwik, Schwartz, H Andrew

arXiv.org Artificial IntelligenceSep-23-2025

State-of-the-art automatic speech recognition (ASR) models like Whisper, perform poorly on atypical speech, such as that produced by individuals with dysarthria. Past works for atypical speech have mostly investigated fully personalized (or idiosyncratic) models, but modeling strategies that can both generalize and handle idiosyncracy could be more effective for capturing atypical speech. To investigate this, we compare four strategies: (a) $\textit{normative}$ models trained on typical speech (no personalization), (b) $\textit{idiosyncratic}$ models completely personalized to individuals, (c) $\textit{dysarthric-normative}$ models trained on other dysarthric speakers, and (d) $\textit{dysarthric-idiosyncratic}$ models which combine strategies by first modeling normative patterns before adapting to individual speech. In this case study, we find the dysarthric-idiosyncratic model performs better than idiosyncratic approach while requiring less than half as much personalized data (36.43 WER with 128 train size vs 36.99 with 256). Further, we found that tuning the speech encoder alone (as opposed to the LM decoder) yielded the best results reducing word error rate from 71% to 32% on average. Our findings highlight the value of leveraging both normative (cross-speaker) and idiosyncratic (speaker-specific) patterns to improve ASR for underrepresented speech populations.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2509.16718

Country: North America > United States (0.69)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine (1.00)
Government (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data

Postma, Emmy, Tejedor-Garcia, Cristian

arXiv.org Artificial IntelligenceSep-1-2025

Speech impairments are prevalent biomarkers for Parkinson's Disease (PD), motivating the development of diagnostic techniques using speech data for clinical applications. Although deep acoustic features have shown promise for PD classification, their effectiveness often varies due to individual speaker differences, a factor that has not been thoroughly explored in the existing literature. This study investigates the effectiveness of three pre-trained audio embeddings (OpenL3, VGGish and Wav2Vec2.0 models) for PD classification. Using the NeuroVoz dataset, OpenL3 outperforms others in diadochokinesis (DDK) and listen and repeat (LR) tasks, capturing critical acoustic features for PD detection. Only Wav2Vec2.0 shows significant gender bias, achieving more favorable results for male speakers, in DDK tasks. The misclassified cases reveal challenges with atypical speech patterns, highlighting the need for improved feature extraction and model robustness in PD detection.

artificial intelligence, classifier, machine learning, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.21437/Interspeech.2025-801

2506.02078

Country: Europe > Netherlands (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.47)

Add feedback

Fairness of Automatic Speech Recognition: Looking Through a Philosophical Lens

Choi, Anna Seo Gyeong, Choi, Hoon

arXiv.org Artificial IntelligenceAug-14-2025

Automatic Speech Recognition (ASR) systems now mediate countless human-technology interactions, yet research on their fairness implications remains surprisingly limited. This paper examines ASR bias through a philosophical lens, arguing that systematic misrecognition of certain speech varieties constitutes more than a technical limitation -- it represents a form of disrespect that compounds historical injustices against marginalized linguistic communities. We distinguish between morally neutral classification (discriminate1) and harmful discrimination (discriminate2), demonstrating how ASR systems can inadvertently transform the former into the latter when they consistently misrecognize non-standard dialects. We identify three unique ethical dimensions of speech technologies that differentiate ASR bias from other algorithmic fairness concerns: the temporal burden placed on speakers of non-standard varieties ("temporal taxation"), the disruption of conversational flow when systems misrecognize speech, and the fundamental connection between speech patterns and personal/cultural identity. These factors create asymmetric power relationships that existing technical fairness metrics fail to capture. The paper analyzes the tension between linguistic standardization and pluralism in ASR development, arguing that current approaches often embed and reinforce problematic language ideologies. We conclude that addressing ASR bias requires more than technical interventions; it demands recognition of diverse speech varieties as legitimate forms of expression worthy of technological accommodation. This philosophical reframing offers new pathways for developing ASR systems that respect linguistic diversity and speaker autonomy.

artificial intelligence, discrimination, speech recognition, (12 more...)

arXiv.org Artificial Intelligence

2508.07143

Country: North America > United States (1.00)

Genre: Research Report (1.00)

Industry:

Law (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

Will AI shape the way we speak? The emerging sociolinguistic influence of synthetic voices

Székely, Éva, Miniota, Jūra, Míša, null, Hejná, null

arXiv.org Artificial IntelligenceJun-10-2025

The growing prevalence of conversational voice interfaces, powered by developments in both speech and language technologies, raises important questions about their influence on human communication. While written communication can signal identity through lexical and stylistic choices, voice-based interactions inherently amplify socioindexical elements - such as accent, intonation, and speech style - which more prominently convey social identity and group affiliation. There is evidence that even passive media such as television is likely to influence the audience's linguistic patterns. Unlike passive media, conversational AI is interactive, creating a more immersive and reciprocal dynamic that holds a greater potential to impact how individuals speak in everyday interactions. Such heightened influence can be expected to arise from phenomena such as acoustic-prosodic entrainment and linguistic accommodation, which occur naturally during interaction and enable users to adapt their speech patterns in response to the system. While this phenomenon is still emerging, its potential societal impact could provide organisations, movements, and brands with a subtle yet powerful avenue for shaping and controlling public perception and social identity. We argue that the socioindexical influence of AI-generated speech warrants attention and should become a focus of interdisciplinary research, leveraging new and existing methodologies and technologies to better understand its implications.

artificial intelligence, chatbot, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.1065

Country: Europe (0.46)

Genre: Research Report (1.00)

Industry: Social Sector (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.47)

Add feedback

Sounding Like a Winner? Prosodic Differences in Post-Match Interviews

Kakouros, Sofoklis, Chen, Haoyu

arXiv.org Artificial IntelligenceJun-4-2025

This study examines the prosodic characteristics associated with winning and losing in post-match tennis interviews. Additionally, this research explores the potential to classify match outcomes solely based on post-match interview recordings using prosodic features and self-supervised learning (SSL) representations. By analyzing prosodic elements such as pitch and intensity, alongside SSL models like Wav2Vec 2.0 and HuBERT, the aim is to determine whether an athlete has won or lost their match. Traditional acoustic features and deep speech representations are extracted from the data, and machine learning classifiers are employed to distinguish between winning and losing players. Results indicate that SSL representations effectively differentiate between winning and losing outcomes, capturing subtle speech patterns linked to emotional states. At the same time, prosodic cues -- such as pitch variability -- remain strong indicators of victory.

artificial intelligence, machine learning, representation, (17 more...)

arXiv.org Artificial Intelligence

2506.02283

Country: Europe > Finland (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment > Sports > Tennis (0.35)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

Interpretable Early Detection of Parkinson's Disease through Speech Analysis

Simone, Lorenzo, Camporeale, Mauro Giuseppe, Rubino, Vito Marco, Gervasi, Vincenzo, Dimauro, Giovanni

arXiv.org Artificial IntelligenceApr-25-2025

Parkinson's disease is a progressive neurodegenerative disorder affecting motor and non-motor functions, with speech impairments among its earliest symptoms. Speech impairments offer a valuable diagnostic opportunity, with machine learning advances providing promising tools for timely detection. In this research, we propose a deep learning approach for early Parkinson's disease detection from speech recordings, which also highlights the vocal segments driving predictions to enhance interpretability. This approach seeks to associate predictive speech patterns with articulatory features, providing a basis for interpreting underlying neuromuscular impairments. We evaluated our approach using the Italian Parkinson's Voice and Speech Database, containing 831 audio recordings from 65 participants, including both healthy individuals and patients. Our approach showed competitive classification performance compared to state-of-the-art methods, while providing enhanced interpretability by identifying key speech features influencing predictions.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2504.17739

Country: Europe > Italy (0.29)

Genre: Research Report > Promising Solution (0.34)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bilingual Dual-Head Deep Model for Parkinson's Disease Detection from Speech

La Quatra, Moreno, Orozco-Arroyave, Juan Rafael, Siniscalchi, Marco Sabato

arXiv.org Artificial IntelligenceMar-13-2025

This work aims to tackle the Parkinson's disease (PD) detection problem from the speech signal in a bilingual setting by proposing an ad-hoc dual-head deep neural architecture for type-based binary classification. One head is specialized for diadochokinetic patterns. The other head looks for natural speech patterns present in continuous spoken utterances. Only one of the two heads is operative accordingly to the nature of the input. Speech representations are extracted from self-supervised learning (SSL) models and wavelet transforms. Adaptive layers, convolutional bottlenecks, and contrastive learning are exploited to reduce variations across languages. Our solution is assessed against two distinct datasets, EWA-DB, and PC-GITA, which cover Slovak and Spanish languages, respectively. Results indicate that conventional models trained on a single language dataset struggle with cross-linguistic generalization, and naive combinations of datasets are suboptimal. In contrast, our model improves generalization on both languages, simultaneously.

dataset, parkinson, speech, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICASSP49660.2025.10889445

2503.10301

Country:

South America > Colombia > Antioquia Department > Medellín (0.04)
North America > United States (0.04)
Europe > Norway > Central Norway > Trøndelag > Trondheim (0.04)
Europe > Italy > Sicily > Palermo (0.04)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Therapeutic Area > Musculoskeletal (1.00)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (0.89)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The order in speech disorder: a scoping review of state of the art machine learning methods for clinical speech classification

Moell, Birger, Aronsson, Fredrik Sand, Östberg, Per, Beskow, Jonas

arXiv.org Artificial IntelligenceMar-3-2025

Background:Speech patterns have emerged as potential diagnostic markers for conditions with varying etiologies. Machine learning (ML) presents an opportunity to harness these patterns for accurate disease diagnosis. Objective: This review synthesized findings from studies exploring ML's capability in leveraging speech for the diagnosis of neurological, laryngeal and mental disorders. Methods: A systematic examination of 564 articles was conducted with 91 articles included in the study, which encompassed a wide spectrum of conditions, ranging from voice pathologies to mental and neurological disorders. Methods for speech classifications were assessed based on the relevant studies and scored between 0-10 based on the reported diagnostic accuracy of their ML models. Results: High diagnostic accuracies were consistently observed for laryngeal disorders, dysarthria, and changes related to speech in Parkinsons disease. These findings indicate the robust potential of speech as a diagnostic tool. Disorders like depression, schizophrenia, mild cognitive impairment and Alzheimers dementia also demonstrated high accuracies, albeit with some variability across studies. Meanwhile, disorders like OCD and autism highlighted the need for more extensive research to ascertain the relationship between speech patterns and the respective conditions. Conclusion: ML models utilizing speech patterns demonstrate promising potential in diagnosing a range of mental, laryngeal, and neurological disorders. However, the efficacy varies across conditions, and further research is needed. The integration of these models into clinical practice could potentially revolutionize the evaluation and diagnosis of a number of different medical conditions.

accuracy, disorder, speech, (17 more...)

arXiv.org Artificial Intelligence

2503.04802

Country:

Europe > Germany > Saarland > Saarbrücken (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(3 more...)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Therapeutic Area > Neurology > Parkinson's Disease (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (0.68)

Add feedback