EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition
Shi, Jiacheng, Du, Hongfei, Hong, Y. Alicia, Gao, Ye
–arXiv.org Artificial Intelligence
Speech emotion recognition (SER) with audio-language models (ALMs) remains vulnerable to distribution shifts at test time, leading to performance degradation in out-of-domain scenarios. Test-time adaptation (TTA) provides a promising solution but often relies on gradient-based updates or prompt tuning, limiting flexibility and practicality. We propose Emo-TTA, a lightweight, training-free adaptation framework that incrementally updates class-conditional statistics via an Expectation-Maximization procedure for explicit test-time distribution estimation, using ALM predictions as priors. Emo-TTA operates on individual test samples without modifying model weights. Experiments on six out-of-domain SER benchmarks show consistent accuracy improvements over prior TTA baselines, demonstrating the effectiveness of statistical adaptation in aligning model predictions with evolving test distributions.
arXiv.org Artificial Intelligence
Oct-1-2025
- Country:
- Asia (0.04)
- Europe > Italy
- Calabria > Catanzaro Province > Catanzaro (0.04)
- North America > Canada
- Genre:
- Research Report (0.70)
- Technology:
- Information Technology > Artificial Intelligence
- Cognitive Science > Emotion (0.72)
- Machine Learning (1.00)
- Natural Language (1.00)
- Speech > Speech Recognition (0.93)
- Information Technology > Artificial Intelligence