EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition

Shi, Jiacheng, Du, Hongfei, Hong, Y. Alicia, Gao, Ye

Oct-1-2025–arXiv.org Artificial Intelligence

Speech emotion recognition (SER) with audio-language models (ALMs) remains vulnerable to distribution shifts at test time, leading to performance degradation in out-of-domain scenarios. Test-time adaptation (TTA) provides a promising solution but often relies on gradient-based updates or prompt tuning, limiting flexibility and practicality. We propose Emo-TTA, a lightweight, training-free adaptation framework that incrementally updates class-conditional statistics via an Expectation-Maximization procedure for explicit test-time distribution estimation, using ALM predictions as priors. Emo-TTA operates on individual test samples without modifying model weights. Experiments on six out-of-domain SER benchmarks show consistent accuracy improvements over prior TTA baselines, demonstrating the effectiveness of statistical adaptation in aligning model predictions with evolving test distributions.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-1-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning (1.00)
  - Speech > Speech Recognition (0.93)
  - Cognitive Science > Emotion (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found