Using External Off-Policy Speech-To-Text Mappings in Contextual End-To-End Automated Speech Recognition
Chan, David M., Ghosh, Shalini, Rastrow, Ariya, Hoffmeister, Björn
–arXiv.org Artificial Intelligence
Here are the key highlights of our approach: first, we speech encoders, for downstream applications that often generate a key-value external knowledge store that maps an audio (a) have fewer labeled training examples, and (b) rapidly evolving representation of each text element of the catalog (usually distributions of speech data. The traditional approach to consisting of 1M-10M examples) to a semantic representation this problem is to frequently collect fresh data, which can be of the text. Next, we train a model that leverages this external used to re-train and specialize models, leveraging tools such store by attending over retrieved key/value pairs, which we as domain-prompts [1], incremental-learning [2], knowledge retrieve through approximate k-nearest neighbors. Relying on distillation [3], hand-written grammars [4], or metric learning an external, constant, and off-policy key-value store means [5, 6] to reduce the impact of re-training the model for the that this store can be updated during specialization, requiring downstream application. Unfortunately, for data that changes only an updated list of phrases for each new model instead of on a rapid basis, such as product listings or applications requiring additional fine-tuning.
arXiv.org Artificial Intelligence
Jan-6-2023
- Country:
- North America > United States > California > Alameda County > Berkeley (0.04)
- Genre:
- Research Report (1.00)
- Technology: