Investigating the Representation of Backchannels and Fillers in Fine-tuned Language Models
Wang, Yu, Lao, Leyi, Huang, Langchu, Skantze, Gabriel, Xu, Yang, Buschmeier, Hendrik
–arXiv.org Artificial Intelligence
Backchannels and fillers are important linguistic expressions in dialogue, but are under-represented in modern transformer-based language models (LMs). Our work studies the representation of them in language models using three fine-tuning strategies. The models are trained on three dialogue corpora in English and Japanese, where backchannels and fillers are preserved and annotated, to investigate how fine-tuning can help LMs learn their representations. We first apply clustering analysis to the learnt representation of backchannels and fillers, and have found increased silhouette scores in representations from fine-tuned models, which suggests that fine-tuning enables LMs to distinguish the nuanced semantic variation in different backchannel and filler use. We also use natural language generation (NLG) metrics to confirm that the utterances generated by fine-tuned language models resemble human-produced utterances more closely. Our findings suggest the potentials of transforming general LMs into conversational LMs that are more capable of producing human-like languages adequately.
arXiv.org Artificial Intelligence
Sep-25-2025
- Country:
- Asia
- China > Guangdong Province
- Shenzhen (0.04)
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- China > Guangdong Province
- Europe
- Estonia > Tartu County
- Tartu (0.04)
- Germany > Saarland
- Saarbrücken (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Greater London > London (0.04)
- France > Provence-Alpes-Côte d'Azur
- Bouches-du-Rhône > Marseille (0.04)
- Italy > Tuscany
- Florence (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Austria > Vienna (0.04)
- Sweden > Stockholm
- Stockholm (0.04)
- Estonia > Tartu County
- North America
- Canada > Ontario
- Toronto (0.04)
- Dominican Republic (0.04)
- United States
- Florida > Miami-Dade County
- Miami (0.04)
- Illinois > Cook County
- Chicago (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Florida > Miami-Dade County
- Canada > Ontario
- Asia
- Genre:
- Research Report > New Finding (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks > Deep Learning (1.00)
- Statistical Learning (1.00)
- Natural Language
- Chatbot (0.96)
- Large Language Model (1.00)
- Speech > Speech Recognition (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence