Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants
Zhang, Youyuan, Gondala, Sashank, Fraga-Silva, Thiago, Van Gysel, Christophe
–arXiv.org Artificial Intelligence
On-device Virtual Assistants (VAs) powered by Automatic Speech Recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of Language Models (LMs) (N-gram word LMs, sub-word neural LMs). We investigate the combination of on-device and server-side signals, and demonstrate significant WER improvements of 23%-35% on various entity-centric query subpopulations by integrating various server-side LMs compared to performing ASR on-device only. We also perform a comparison between LMs trained on domain data and a GPT-3 variant offered by OpenAI as a baseline. Furthermore, we also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model and integrates knowledge learned from domain-specific data to a VA ASR system.
arXiv.org Artificial Intelligence
Nov-2-2023
- Country:
- North America > United States (0.04)
- Genre:
- Research Report
- Experimental Study (0.46)
- New Finding (0.68)
- Research Report
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks
- Deep Learning (1.00)
- Natural Language
- Chatbot (0.88)
- Large Language Model (1.00)
- Representation & Reasoning (1.00)
- Speech (1.00)
- Machine Learning > Neural Networks
- Information Technology > Artificial Intelligence