Modeling Turn-Taking with Semantically Informed Gestures
Suresh, Varsha, Mughal, M. Hamza, Theobalt, Christian, Demberg, Vera
–arXiv.org Artificial Intelligence
In conversation, humans use multimodal cues, such as speech, gestures, and gaze, to manage turn-taking. While linguistic and acoustic features are informative, gestures provide complementary cues for modeling these transitions. To study this, we introduce DnD Gesture++, an extension of the multi-party DnD Gesture corpus enriched with 2,663 semantic gesture annotations spanning iconic, metaphoric, deictic, and discourse types. Using this dataset, we model turn-taking prediction through a Mixture-of-Experts framework integrating text, audio, and gestures. Experiments show that incorporating semantically guided gestures yields consistent performance gains over baselines, demonstrating their complementary role in multimodal turn-taking.
arXiv.org Artificial Intelligence
Oct-23-2025
- Country:
- Asia > Japan
- Shikoku > Ehime Prefecture > Matsuyama (0.04)
- Europe > Germany
- Saarland (0.05)
- North America > United States
- Illinois > Cook County > Chicago (0.04)
- Asia > Japan
- Genre:
- Research Report > New Finding (0.47)
- Technology: