Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models
Stepachev, Pavel, Chen, Pinzhen, Haddow, Barry
–arXiv.org Artificial Intelligence
Large language models (LLMs) have started to play a vital Formally, our approach explores suitable prompting role in modelling speech and text. To explore the best use of strategies to perform speech emotion prediction from ASR context and multiple systems' outputs for post-ASR speech outputs without speech signals. Most efforts are centred on emotion prediction, we study LLM prompting on a recent creating a practical context for prompting. The contributions task named GenSEC. Our techniques include ASR transcript of this work are: ranking, variable conversation context, and system output fusion. Methodologically, we 1) select and rank ASR outputs We show that the conversation context has diminishing as LLM input using multiple metrics and 2) exploit and returns and the metric used to select the transcript for prediction fuse the conversation history and multiple ASR system is crucial.
arXiv.org Artificial Intelligence
Oct-4-2024