Applying General Turn-taking Models to Conversational Human-Robot Interaction
Skantze, Gabriel, Irfan, Bahar
–arXiv.org Artificial Intelligence
Turn-taking is a fundamental aspect of conversation, but current Human-Robot Interaction (HRI) systems often rely on simplistic, silence-based models, leading to unnatural pauses and interruptions. This paper investigates, for the first time, the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI. These models are trained on human-human dialogue data using self-supervised learning objectives, without requiring domain-specific fine-tuning. We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions. We evaluated the proposed system in a within-subject study against a traditional baseline system, using the Furhat robot with 39 adults in a conversational setting, in combination with a large language model for autonomous response generation. The results show that participants significantly prefer the proposed system, and it significantly reduces response delays and interruptions.
arXiv.org Artificial Intelligence
Jan-15-2025
- Country:
- Europe (0.93)
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Discourse & Dialogue (1.00)
- Large Language Model (1.00)
- Robots (1.00)
- Speech > Speech Recognition (0.93)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence