GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations

Li, Yupei, Sun, Qiyang, Murthy, Sunil Munthumoduku Krishna, Alturki, Emran, Schuller, Björn W.

arXiv.org Artificial Intelligence 

GatedxLSTM: A Multimodal Affective Computing Approach for Emotion Recognition in Conversations Y upei Li, Qiyang Sun, Sunil Munthumoduku Krishna Murthy, Emran Alturki, and Bj orn W . Schuller Fellow, IEEE Abstract --Affective Computing (AC) is essential for advancing Artificial General Intelligence (AGI), with emotion recognition serving as a key component. However, human emotions are inherently dynamic, influenced not only by an individual's expressions but also by interactions with others, and single-modality approaches often fail to capture their full dynamics. Multimodal Emotion Recognition (MER) leverages multiple signals but traditionally relies on utterance-level analysis, overlooking the dynamic nature of emotions in conversations. Emotion Recognition in Conversation (ERC) addresses this limitation, yet existing methods struggle to align multimodal features and explain why emotions evolve within dialogues. T o bridge this gap, we propose GatedxLSTM, a novel speech-text multimodal ERC model that explicitly considers voice and transcripts of both the speaker and their conversational partner(s) to identify the most influential sentences driving emotional shifts. By integrating Contrastive Language-Audio Pretraining (CLAP) for improved cross-modal alignment and employing a gating mechanism to emphasise emotionally impactful utterances, GatedxLSTM enhances both interpretability and performance. Experiments on the IEMOCAP dataset demonstrate that GatedxLSTM achieves state-of-the-art (SOT A) performance among open-source methods in four-class emotion classification. These results validate its effectiveness for ERC applications and provide an interpretability analysis from a psychological perspective. I NTRODUCTION Artificial General Intelligence (AGI) represents a key future direction in AI development, with Affective Computing (AC) playing a crucial role in enhancing AGI's ability to interact effectively with humans. Sunil Munthumoduku Krishna Murthy is with CHI - Chair of Health Informatics, MRI, Technical University of Munich, Germany (e-mail: sunil.munthumoduku@tum.de). Bj orn W . Schuller is with GLAM, Department of Computing, Imperial College London, UK; CHI - Chair of Health Informatics, Technical University of Munich, Germany; relAI - the Konrad Zuse School of Excellence in Reliable AI, Munich, Germany; MDSI - Munich Data Science Institute, Munich, Germany; and MCML - Munich Center for Machine Learning, Munich, Germany (e-mail: schuller@tum.de). Y upei Li and Qiyang Sun contributed equally to this work.