Chen, I-Fan
Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition
Yu, Yu, Yang, Chao-Han Huck, Kolehmainen, Jari, Shivakumar, Prashanth G., Gu, Yile, Ryu, Sungho, Ren, Roger, Luo, Qi, Gourav, Aditya, Chen, I-Fan, Liu, Yi-Chieh, Dinh, Tuan, Gandhe, Ankur, Filimonov, Denis, Ghosh, Shalini, Stolcke, Andreas, Rastow, Ariya, Bulyko, Ivan
However, as the size of the pretrained models increases, the cost associated We propose a neural language modeling system based on with fine-tuning and deploying these models for low-rank adaptation (LoRA) for speech recognition output real-world applications also escalates. To address this practical rescoring. Although pretrained language models (LMs) challenge, a range of parameter-efficient methods (e.g., like BERT have shown superior performance in second-pass adapters, model reprogramming, and prompts) have been proposed rescoring, the high computational cost of scaling up the pretraining [11, 12, 13, 14, 15, 16, 17, 18] to alleviate the computation stage and adapting the pretrained models to specific and memory demands of fine-tuning LLMs. Low-rank domains limit their practical use in rescoring. Here we present adaptation (LoRA) [19] freezes all pretrained parameters in a method based on low-rank decomposition to train a rescoring the LLM and inserts a trainable pair of matrices (acting as a BERT model and adapt it to new domains using only a low-rank decomposition of a full matrix) additively into each fraction (0.08%) of the pretrained parameters. These inserted layer of the Transformer architecture. Compared to other matrices are optimized through a discriminative training objective parameter-efficient training methods, such as adapters [12], along with a correlation-based regularization loss. The LoRA has two distinct advantages: 1) it employs a simple proposed low-rank adaptation RescoreBERT (LoRB) architecture architecture and has the potential to reduce the number of is evaluated on LibriSpeech and internal datasets with trainable parameters compared to alternatives; 2) LoRA does decreased training times by factors between 5.4 and 3.6.
Toward Fairness in Speech Recognition: Discovery and mitigation of performance disparities
Dheram, Pranav, Ramakrishnan, Murugesan, Raju, Anirudh, Chen, I-Fan, King, Brian, Powell, Katherine, Saboowala, Melissa, Shetty, Karan, Stolcke, Andreas
As for other forms of AI, speech recognition has recently been examined with respect to performance disparities across different user cohorts. One approach to achieve fairness in speech recognition is to (1) identify speaker cohorts that suffer from subpar performance and (2) apply fairness mitigation measures targeting the cohorts discovered. In this paper, we report on initial findings with both discovery and mitigation of performance disparities using data from a product-scale AI assistant speech recognition system. We compare cohort discovery based on geographic and demographic information to a more scalable method that groups speakers without human labels, using speaker embedding technology. For fairness mitigation, we find that oversampling of underrepresented cohorts, as well as modeling speaker cohort membership by additional input variables, reduces the gap between top- and bottom-performing cohorts, without deteriorating overall recognition accuracy.