FLEXI: Benchmarking Full-duplex Human-LLM Speech Interaction
Ge, Yuan, Chen, Saihan, Xiao, Jingqi, Liu, Xiaoqian, Xiao, Tong, Xiang, Yan, Yu, Zhengtao, Zhu, Jingbo
–arXiv.org Artificial Intelligence
Full-Duplex Speech-to-Speech Large Language Models (LLMs) are foundational to natural human-computer interaction, enabling real-time spoken dialogue systems. However, benchmarking and modeling these models remains a fundamental challenge. We introduce FLEXI, the first benchmark for full-duplex LLM-human spoken interaction that explicitly incorporates model interruption in emergency scenarios. FLEXI systematically evaluates the latency, quality, and conversational effectiveness of real-time dialogue through six diverse human-LLM interaction scenarios, revealing significant gaps between open source and commercial models in emergency awareness, turn terminating, and interaction latency. Finally, we suggest that next token-pair prediction offers a promising path toward achieving truly seamless and human-like full-duplex interaction.
arXiv.org Artificial Intelligence
Sep-29-2025
- Country:
- Asia > China
- Liaoning Province > Shenyang (0.04)
- Yunnan Province > Kunming (0.04)
- Europe
- Austria > Vienna (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Asia > China
- Genre:
- Research Report (0.40)
- Technology: