VoiceBench: Benchmarking LLM-Based Voice Assistants
Chen, Yiming, Yue, Xianghu, Zhang, Chen, Gao, Xiaoxue, Tan, Robby T., Li, Haizhou
–arXiv.org Artificial Intelligence
Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based voice assistants development. Current evaluations focus primarily on automatic speech recognition (ASR) or general knowledge evaluation with clean speeches, neglecting the more intricate, real-world scenarios that involve diverse speaker characteristics, environmental and content factors. To address this, we introduce VoiceBench, the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants. VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.
arXiv.org Artificial Intelligence
Dec-11-2024
- Country:
- Asia > Thailand (0.14)
- Europe > France (0.14)
- North America
- Canada (0.14)
- United States (0.14)
- Genre:
- Research Report (0.82)
- Technology: