Enhancing Speech Instruction Understanding and Disambiguation in Robotics via Speech Prosody
Sasu, David, Yamoah, Kweku Andoh, Quartey, Benedict, Schluter, Natalie
–arXiv.org Artificial Intelligence
Enabling robots to accurately interpret and execute spoken language instructions is essential for effective human-robot collaboration. Traditional methods rely on speech recognition to transcribe speech into text, often discarding crucial prosodic cues needed for disambiguating intent. We propose a novel approach that directly leverages speech prosody to infer and resolve instruction intent. Predicted intents are integrated into large language models via in-context learning to disambiguate and select appropriate task plans. Additionally, we present the first ambiguous speech dataset for robotics, designed to advance research in speech disambiguation. Our method achieves 95.79% accuracy in detecting referent intents within an utterance and determines the intended task plan of ambiguous instructions with 71.96% accuracy, demonstrating its potential to significantly improve human-robot communication.
arXiv.org Artificial Intelligence
Jun-4-2025
- Country:
- Asia > China
- Europe
- Denmark > Capital Region
- Copenhagen (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Denmark > Capital Region
- North America > United States (0.04)
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (0.97)
- Natural Language > Large Language Model (0.95)
- Robots (1.00)
- Speech > Speech Recognition (0.90)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence