Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition

Chan, David M., Ghosh, Shalini, Rastrow, Ariya, Hoffmeister, Björn

Jan-6-2023–arXiv.org Artificial Intelligence

Here are the key highlights of our approach: first, we speech encoders, for downstream applications that often generate a key-value external knowledge store that maps an audio (a) have fewer labeled training examples, and (b) rapidly evolving representation of each text element of the catalog (usually distributions of speech data. The traditional approach to consisting of 1M-10M examples) to a semantic representation this problem is to frequently collect fresh data, which can be of the text. Next, we train a model that leverages this external used to re-train and specialize models, leveraging tools such store by attending over retrieved key/value pairs, which we as domain-prompts [1], incremental-learning [2], knowledge retrieve through approximate k-nearest neighbors. Relying on distillation [3], hand-written grammars [4], or metric learning an external, constant, and off-policy key-value store means [5, 6] to reduce the impact of re-training the model for the that this store can be updated during specialization, requiring downstream application. Unfortunately, for data that changes only an updated list of phrases for each new model instead of on a rapid basis, such as product listings or applications requiring additional fine-tuning.

artificial intelligence, catalog, machine learning, (15 more...)

arXiv.org Artificial Intelligence

Jan-6-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States > California > Alameda County > Berkeley (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning > Statistical Learning
    - Nearest Neighbor Methods (0.55)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found