text adaptation
Effective Text Adaptation for LLM-based ASR through Soft Prompt Fine-Tuning
Ma, Yingyi, Liu, Zhe, Kalinli, Ozlem
The advent of Large Language Models (LLM) has reformed the Automatic Speech Recognition (ASR). Prompting LLM with audio embeddings to generate transcriptions becomes the new state-of-the-art ASR. Despite LLMs being trained with an extensive amount of text corpora, high-quality domain-specific text data can still significantly enhance ASR performance on domain adaptation tasks. Although LLM-based ASR can naturally incorporate more text corpora by fine-tuning the LLM decoder, fine-tuning such ASR on text-only data without paired prompts may diminish the effectiveness of domain-specific knowledge. To mitigate this issue, we propose a two-step soft prompt fine-tuning strategy that enhances domain-specific text adaptation. Experimental results show that text adaptation with our proposed method achieved a relative up to 9% Word Error Rate (WER) reduction and up to 18% Entity Error Rate (EER) reduction on the target domain compared to the baseline ASR. Combining this with domain-specific Language Model (LM) fusion can further improve the EER by a relative 2-5%
- North America > United States > New York (0.04)
- North America > United States > California > San Mateo County > Menlo Park (0.04)
Text-only Domain Adaptation using Unified Speech-Text Representation in Transducer
Huang, Lu, Li, Boyu, Zhang, Jun, Lu, Lu, Ma, Zejun
Domain adaptation using text-only corpus is challenging in end-to-end(E2E) speech recognition. Adaptation by synthesizing audio from text through TTS is resource-consuming. We present a method to learn Unified Speech-Text Representation in Conformer Transducer(USTR-CT) to enable fast domain adaptation using the text-only corpus. Different from the previous textogram method, an extra text encoder is introduced in our work to learn text representation and is removed during inference, so there is no modification for online deployment. To improve the efficiency of adaptation, single-step and multi-step adaptations are also explored. The experiments on adapting LibriSpeech to SPGISpeech show the proposed method reduces the word error rate(WER) by relatively 44% on the target domain, which is better than those of TTS method and textogram method. Also, it is shown the proposed method can be combined with internal language model estimation(ILME) to further improve the performance.
Women Innovators And Researchers Who Made A Difference In AI In 2021
There is a troubling and persistent absence of women when it comes to the field of artificial intelligence and data science. Women constitute a mere 22 per cent or less than a quarter of professionals in this field, as says the report "Where are the women? Yet, despite low participation and obstacles, women are breaking the silos and setting an example for players out in the field of AI. To honour their commitment and work done, we have listed some of the women innovators and researchers who have worked tirelessly and contributed significantly to the field of AI and data science. The list below is provided in no particular order.