Context and System Fusion in Post-ASR Emotion Recognition with Large Language Models

Stepachev, Pavel, Chen, Pinzhen, Haddow, Barry

Oct-4-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) have started to play a vital Formally, our approach explores suitable prompting role in modelling speech and text. To explore the best use of strategies to perform speech emotion prediction from ASR context and multiple systems' outputs for post-ASR speech outputs without speech signals. Most efforts are centred on emotion prediction, we study LLM prompting on a recent creating a practical context for prompting. The contributions task named GenSEC. Our techniques include ASR transcript of this work are: ranking, variable conversation context, and system output fusion. Methodologically, we 1) select and rank ASR outputs We show that the conversation context has diminishing as LLM input using multiple metrics and 2) exploit and returns and the metric used to select the transcript for prediction fuse the conversation history and multiple ASR system is crucial.

accuracy, asr output, recognition, (12 more...)

arXiv.org Artificial Intelligence

Oct-4-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.33)