VoiceBench: Benchmarking LLM-Based Voice Assistants

Chen, Yiming, Yue, Xianghu, Zhang, Chen, Gao, Xiaoxue, Tan, Robby T., Li, Haizhou

Dec-11-2024–arXiv.org Artificial Intelligence

Building on the success of large language models (LLMs), recent advancements such as GPT-4o have enabled real-time speech interactions through LLM-based voice assistants, offering a significantly improved user experience compared to traditional text-based interactions. However, the absence of benchmarks designed to evaluate these speech interaction capabilities has hindered progress of LLM-based voice assistants development. Current evaluations focus primarily on automatic speech recognition (ASR) or general knowledge evaluation with clean speeches, neglecting the more intricate, real-world scenarios that involve diverse speaker characteristics, environmental and content factors. To address this, we introduce VoiceBench, the first benchmark designed to provide a multi-faceted evaluation of LLM-based voice assistants. VoiceBench also includes both real and synthetic spoken instructions that incorporate the above three key real-world variations. Extensive experiments reveal the limitations of current LLM-based voice assistant models and offer valuable insights for future research and development in this field.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

Dec-11-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Thailand (0.14)
- Europe > France (0.14)
- North America
  - Canada (0.14)
  - United States (0.14)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.89)
  - Natural Language > Large Language Model (1.00)
  - Speech > Speech Recognition (1.00)