Enhancing Dialogue Annotation with Speaker Characteristics Leveraging a Frozen LLM

Thebaud, Thomas, Lu, Yen-Ju, Wiesner, Matthew, Viechnicki, Peter, Dehak, Najim

Sep-10-2025–arXiv.org Artificial Intelligence

ABSTRACT In dialogue transcription pipelines, Large Language Models (LLMs) are frequently employed in post-processing to improve grammar, punctuation, and readability. We explore a complementary post-processing step: enriching transcribed dialogues by adding metadata tags for speaker characteristics such as age, gender, and emotion. Some of the tags are global to the entire dialogue, while some are time-variant. Our approach couples frozen audio foundation models, such as Whisper or WavLM, with a frozen LLAMA language model to infer these speaker attributes, without requiring task-specific fine-tuning of either model. Using lightweight, efficient connectors to bridge audio and language representations, we achieve competitive performance on speaker profiling tasks while preserving modularity and speed. Additionally, we demonstrate that a frozen LLAMA model can compare x-vectors directly, achieving an Equal Error Rate of 8.8% in some scenarios. Keywords: Large Language Models, Speaker Characterization, Automatic Speaker V erification, Emotion Recognition, Connectors 1. INTRODUCTION With the widespread adoption of voice-assisted technologies, automatic transcription services, and real-time speech translation, speech processing tasks have become increasingly prevalent in both consumer and industrial applications.

arxiv preprint arxiv, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Sep-10-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.14)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.94)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found