An LLM Benchmark for Addressee Recognition in Multi-modal Multi-party Dialogue
Inoue, Koji, Lala, Divesh, Elmers, Mikey, Ochi, Keiko, Kawahara, Tatsuya
–arXiv.org Artificial Intelligence
Handling multi-party dialogues represents a significant step for advancing spoken dialogue systems, necessitating the development of tasks specific to multi-party interactions. To address this challenge, we are constructing a multi-modal multi-party dialogue corpus of triadic (three-participant) discussions. This paper focuses on the task of addressee recognition, identifying who is being addressed to take the next turn, a critical component unique to multi-party dialogue systems. A subset of the corpus was annotated with addressee information, revealing that explicit addressees are indicated in approximately 20% of conversational turns. To evaluate the task's complexity, we benchmarked the performance of a large language model (GPT-4o) on addressee recognition. The results showed that GPT-4o achieved an accuracy only marginally above chance, underscoring the challenges of addressee recognition in multi-party dialogue. These findings highlight the need for further research to enhance the capabilities of large language models in understanding and navigating the intricacies of multi-party conversational dynamics.
arXiv.org Artificial Intelligence
Jan-27-2025
- Country:
- Asia > Japan
- Hokkaidō (0.04)
- Honshū
- Chūbu (0.04)
- Kansai
- Kyoto Prefecture > Kyoto (0.05)
- Osaka Prefecture > Osaka (0.06)
- Kantō > Tokyo Metropolis Prefecture
- Tokyo (0.05)
- Kyūshū & Okinawa > Kyūshū
- Fukuoka Prefecture > Fukuoka (0.04)
- Asia > Japan
- Genre:
- Research Report > New Finding (0.54)
- Technology: