AITopics | schüller

Collaborating Authors

schüller

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Large Language Models for Depression Recognition in Spoken Language Integrating Psychological Knowledge

Li, Yupei, Shao, Shuaijie, Milling, Manuel, Schuller, Björn W.

arXiv.org Artificial IntelligenceAug-27-2025

Depression is a growing concern gaining attention in both public discourse and AI research. While deep neural networks (DNNs) have been used for recognition, they still lack real-world effectiveness. Large language models (LLMs) show strong potential but require domain-specific fine-tuning and struggle with non-textual cues. Since depression is often expressed through vocal tone and behaviour rather than explicit text, relying on language alone is insufficient. Diagnostic accuracy also suffers without incorporating psychological expertise. To address these limitations, we present, to the best of our knowledge, the first application of LLMs to multimodal depression detection using the DAIC-WOZ dataset. We extract the audio features using the pre-trained model Wav2Vec, and mapped it to text-based LLMs for further processing. We also propose a novel strategy for incorporating psychological knowledge into LLMs to enhance diagnostic performance, specifically using a question and answer set to grant authorised knowledge to LLMs. Our approach yields a notable improvement in both Mean Absolute Error (MAE) and Root Mean Square Error (RMSE) compared to a base score proposed by the related original paper. The codes are available at https://github.com/myxp-lyp/Depression-detection.git

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.3389/fcomp.2025.1629725

2505.22863

Country: Europe > Germany (0.15)

Genre:

Overview (1.00)
Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)
Health & Medicine > Diagnostic Medicine (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Speech-Based Depressive Mood Detection in the Presence of Multiple Sclerosis: A Cross-Corpus and Cross-Lingual Study

Gonzalez-Machorro, Monica, Reichel, Uwe, Hecker, Pascal, Hammer, Helly, Sagha, Hesam, Eyben, Florian, Hoepner, Robert, Schuller, Björn W.

arXiv.org Artificial IntelligenceAug-26-2025

Depression commonly co-occurs with neurodegenerative disorders like Multiple Sclerosis (MS), yet the potential of speech-based Artificial Intelligence for detecting depression in such contexts remains unexplored. This study examines the transferability of speech-based depression detection methods to people with MS (pwMS) through cross-corpus and cross-lingual analysis using English data from the general population and German data from pwMS. Our approach implements supervised machine learning models using: 1) conventional speech and language features commonly used in the field, 2) emotional dimensions derived from a Speech Emotion Recognition (SER) model, and 3) exploratory speech feature analysis. Despite limited data, our models detect depressive mood in pwMS with moderate generalisability, achieving a 66% Unweighted Average Recall (UAR) on a binary task. Feature selection further improved performance, boosting UAR to 74%. Our findings also highlight the relevant role emotional changes have as an indicator of depressive mood in both the general population and within PwMS. This study provides an initial exploration into generalising speech-based depression detection, even in the presence of co-occurring conditions, such as neurodegenerative diseases.

artificial intelligence, depression, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2508.18092

Country: Europe (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Breaking Resource Barriers in Speech Emotion Recognition via Data Distillation

Chang, Yi, Ren, Zhao, Zhao, Zhonghao, Nguyen, Thanh Tam, Qian, Kun, Schultz, Tanja, Schuller, Björn W.

arXiv.org Artificial IntelligenceJun-2-2025

Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2406.15119

Country: Europe > Germany (0.46)

Genre: Research Report (1.00)

Industry:

Health & Medicine (1.00)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Internet of Things (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MELT: Towards Automated Multimodal Emotion Data Annotation by Leveraging LLM Embedded Knowledge

Jing, Xin, Wang, Jiadong, Tsangko, Iosif, Triantafyllopoulos, Andreas, Schuller, Björn W.

arXiv.org Artificial IntelligenceJun-2-2025

Although speech emotion recognition (SER) has advanced significantly with deep learning, annotation remains a major hurdle. Human annotation is not only costly but also subject to inconsistencies annotators often have different preferences and may lack the necessary contextual knowledge, which can lead to varied and inaccurate labels. Meanwhile, Large Language Models (LLMs) have emerged as a scalable alternative for annotating text data. However, the potential of LLMs to perform emotional speech data annotation without human supervision has yet to be thoroughly investigated. To address these problems, we apply GPT-4o to annotate a multimodal dataset collected from the sitcom Friends, using only textual cues as inputs. By crafting structured text prompts, our methodology capitalizes on the knowledge GPT-4o has accumulated during its training, showcasing that it can generate accurate and contextually relevant annotations without direct access to multimodal inputs. Therefore, we propose MELT, a multimodal emotion dataset fully annotated by GPT-4o. We demonstrate the effectiveness of MELT by fine-tuning four self-supervised learning (SSL) backbones and assessing speech emotion recognition performance across emotion datasets. Additionally, our subjective experiments\' results demonstrate a consistence performance improvement on SER.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.24493

Country:

Europe > Germany (0.15)
North America > United States (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MAD-UV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge

Yang, Zijiang, Song, Meishu, Jing, Xin, Zhang, Haojie, Qian, Kun, Hu, Bin, Tamada, Kota, Takumi, Toru, Schuller, Björn W., Yamamoto, Yoshiharu

arXiv.org Artificial IntelligenceJan-8-2025

The Mice Autism Detection via Ultrasound Vocalization (MAD-UV) Challenge introduces the first INTERSPEECH challenge focused on detecting autism spectrum disorder (ASD) in mice through their vocalizations. Participants are tasked with developing models to automatically classify mice as either wild-type or ASD models based on recordings with a high sampling rate. Our baseline system employs a simple CNN-based classification using three different spectrogram features. Results demonstrate the feasibility of automated ASD detection, with the considered audible-range features achieving the best performance (UAR of 0.600 for segment-level and 0.625 for subject-level classification). This challenge bridges speech technology and biomedical research, offering opportunities to advance our understanding of ASD models through machine learning approaches. The findings suggest promising directions for vocalization analysis and highlight the potential value of audible and ultrasound vocalizations in ASD detection.

artificial intelligence, deep learning, machine learning, (13 more...)

arXiv.org Artificial Intelligence

2501.04292

Country:

Europe (1.00)
Asia > Japan (0.28)

Genre: Research Report > New Finding (0.87)

Industry: Health & Medicine > Therapeutic Area > Neurology > Autism (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Towards Friendly AI: A Comprehensive Review and New Perspectives on Human-AI Alignment

Sun, Qiyang, Li, Yupei, Alturki, Emran, Murthy, Sunil Munthumoduku Krishna, Schuller, Björn W.

arXiv.org Artificial IntelligenceDec-19-2024

As Artificial Intelligence (AI) continues to advance rapidly, Friendly AI (FAI) has been proposed to advocate for more equitable and fair development of AI. Despite its importance, there is a lack of comprehensive reviews examining FAI from an ethical perspective, as well as limited discussion on its potential applications and future directions. This paper addresses these gaps by providing a thorough review of FAI, focusing on theoretical perspectives both for and against its development, and presenting a formal definition in a clear and accessible format. Key applications are discussed from the perspectives of eXplainable AI (XAI), privacy, fairness and affective computing (AC). Additionally, the paper identifies challenges in current technological advancements and explores future research avenues. The findings emphasise the significance of developing FAI and advocate for its continued advancement to ensure ethical and beneficial AI development.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2412.15114

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(6 more...)

Genre:

Overview (0.84)
Instructional Material (0.67)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Government (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

Classification of Spontaneous and Scripted Speech for Multilingual Audio

Elisha, Shahar, McDowell, Andrew, Beguerisse-Díaz, Mariano, Benetos, Emmanouil

arXiv.org Artificial IntelligenceDec-16-2024

Distinguishing scripted from spontaneous speech is an essential tool for better understanding how speech styles influence speech processing research. It can also improve recommendation systems and discovery experiences for media users through better segmentation of large recorded speech catalogues. This paper addresses the challenge of building a classifier that generalises well across different formats and languages. We systematically evaluate models ranging from traditional, handcrafted acoustic and prosodic features to advanced audio transformers, utilising a large, multilingual proprietary podcast dataset for training and validation. We break down the performance of each model across 11 language groups to evaluate cross-lingual biases. Our experimental analysis extends to publicly available datasets to assess the models' generalisability to non-podcast domains. Our results indicate that transformer-based models consistently outperform traditional feature-based techniques, achieving state-of-the-art performance in distinguishing between scripted and spontaneous speech across various languages.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2412.11896

Country:

North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(6 more...)

Genre: Research Report > New Finding (0.34)

Industry: Media (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

autrainer: A Modular and Extensible Deep Learning Toolkit for Computer Audition Tasks

Rampp, Simon, Triantafyllopoulos, Andreas, Milling, Manuel, Schuller, Björn W.

arXiv.org Artificial IntelligenceDec-16-2024

Reproducibility, code quality, and development speed constitute the'impossible trinity' of contemporary experimental artificial intelligence (AI) research. Of the three, the first has attracted the most attention in recent literature [1], as reproducibility of findings is a cornerstone of science. However, the impact of the other two should not be underestimated. Development speed allows the quick iteration of ideas - a necessary prerequisite in experimental sciences and a prominent feature of AI research, as asserted by "The Bitter Lesson" of R. Sutton [2]. Similarly, code quality can be the key differentiating factor when it comes to "standing on the shoulders of giants", as shaky foundations can lead to a spectacular collapse. This is why toolkits that are easy-to-use and provide pre-baked reproducibility are critical for the proliferation and adaptation of new ideas. The not-so-recent renaissance of deep learning (DL) has been largely driven by the creation of such toolkits.

artificial intelligence, autrainer, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.11943

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > Alaska (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.04)

Genre: Research Report (0.41)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Audio-based Kinship Verification Using Age Domain Conversion

Sun, Qiyang, Akman, Alican, Jing, Xin, Milling, Manuel, Schuller, Björn W.

arXiv.org Artificial IntelligenceOct-14-2024

Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an "age-standardised domain" wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2410.1112

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology > Services (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Explaining Deep Learning Embeddings for Speech Emotion Recognition by Predicting Interpretable Acoustic Features

Dixit, Satvik, Low, Daniel M., Elbanna, Gasser, Catania, Fabio, Ghosh, Satrajit S.

arXiv.org Artificial IntelligenceSep-14-2024

Pre-trained deep learning embeddings have consistently shown superior performance over handcrafted acoustic features in speech emotion recognition (SER). However, unlike acoustic features with clear physical meaning, these embeddings lack clear interpretability. Explaining these embeddings is crucial for building trust in healthcare and security applications and advancing the scientific understanding of the acoustic information that is encoded in them. This paper proposes a modified probing approach to explain deep learning embeddings in the SER space. We predict interpretable acoustic features (e.g., f0, loudness) from (i) the complete set of embeddings and (ii) a subset of the embedding dimensions identified as most important for predicting each emotion. If the subset of the most important dimensions better predicts a given emotion than all dimensions and also predicts specific acoustic features more accurately, we infer those acoustic features are important for the embedding model for the given task. We conducted experiments using the WavLM embeddings and eGeMAPS acoustic features as audio representations, applying our method to the RAVDESS and SAVEE emotional speech datasets. Based on this evaluation, we demonstrate that Energy, Frequency, Spectral, and Temporal categories of acoustic features provide diminishing information to SER in that order, demonstrating the utility of the probing classifier method to relate embeddings to interpretable acoustic features.

acoustic feature, emotion, information, (14 more...)

arXiv.org Artificial Intelligence

2409.09511

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.88)

Add feedback