Server-side Rescoring of Spoken Entity-centric Knowledge Queries for Virtual Assistants

Zhang, Youyuan, Gondala, Sashank, Fraga-Silva, Thiago, Van Gysel, Christophe

Nov-2-2023–arXiv.org Artificial Intelligence

On-device Virtual Assistants (VAs) powered by Automatic Speech Recognition (ASR) require effective knowledge integration for the challenging entity-rich query recognition. In this paper, we conduct an empirical study of modeling strategies for server-side rescoring of spoken information domain queries using various categories of Language Models (LMs) (N-gram word LMs, sub-word neural LMs). We investigate the combination of on-device and server-side signals, and demonstrate significant WER improvements of 23%-35% on various entity-centric query subpopulations by integrating various server-side LMs compared to performing ASR on-device only. We also perform a comparison between LMs trained on domain data and a GPT-3 variant offered by OpenAI as a baseline. Furthermore, we also show that model fusion of multiple server-side LMs trained from scratch most effectively combines complementary strengths of each model and integrates knowledge learned from domain-specific data to a VA ASR system.

category, n-best, query, (15 more...)

arXiv.org Artificial Intelligence

Nov-2-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.04)

Genre:
- Research Report
  - New Finding (0.68)
  - Experimental Study (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (1.00)
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.88)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found