Applying General Turn-taking Models to Conversational Human-Robot Interaction

Jan-15-2025–arXiv.org Artificial Intelligence

Turn-taking is a fundamental aspect of conversation, but current Human-Robot Interaction (HRI) systems often rely on simplistic, silence-based models, leading to unnatural pauses and interruptions. This paper investigates, for the first time, the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI. These models are trained on human-human dialogue data using self-supervised learning objectives, without requiring domain-specific fine-tuning. We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions. We evaluated the proposed system in a within-subject study against a traditional baseline system, using the Furhat robot with 39 adults in a conversational setting, in combination with a large language model for autonomous response generation. The results show that participants significantly prefer the proposed system, and it significantly reduces response delays and interruptions.

large language model, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

Jan-15-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
  - Singapore (0.04)
- Europe
  - Portugal > Lisbon
    - Lisbon (0.04)
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Switzerland > Basel-City
    - Basel (0.04)
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
- North America > United States
  - Illinois > Cook County
    - Chicago (0.04)
  - New York > New York County
    - New York City (0.04)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Media > Film (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Discourse & Dialogue (1.00)
    - Large Language Model (1.00)
  - Robots (1.00)
  - Speech > Speech Recognition (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found