Goto

Collaborating Authors

 language expansion


Language-Aware Prompt Tuning for Parameter-Efficient Seamless Language Expansion in Multilingual ASR

arXiv.org Artificial Intelligence

Recent advancements in multilingual automatic speech recognition (ASR) have been driven by large-scale end-to-end models like Whisper. However, challenges such as language interference and expanding to unseen languages (language expansion) without degrading performance persist. This paper addresses these with three contributions: 1) Entire Soft Prompt Tuning (Entire SPT), which applies soft prompts to both the encoder and decoder, enhancing feature extraction and decoding; 2) Language-A ware Prompt Tuning (LAPT), which leverages cross-lingual similarities to encode shared and language-specific features using lightweight prompt matrices; 3) SPT - Whisper, a toolkit that integrates SPT into Whisper and enables efficient continual learning. Experiments across three languages from FLEURS demonstrate that Entire SPT and LAPT outperform Decoder SPT by 5.0% and 16.0% in language expansion tasks, respectively, providing an efficient solution for dynamic, multilingual ASR models with minimal computational overhead.


Task Arithmetic for Language Expansion in Speech Translation

arXiv.org Artificial Intelligence

Recent advances in large language models (LLMs) have gained interest in speech-text multimodal foundation models, achieving strong performance on instruction-based speech translation (ST). However, expanding language pairs from an existing instruction-tuned ST system is costly due to the necessity of re-training on a combination of new and previous datasets. We propose to expand new language pairs by merging the model trained on new language pairs and the existing model, using task arithmetic. We find that the direct application of task arithmetic for ST causes the merged model to fail to follow instructions; thus, generating translation in incorrect languages. To eliminate language confusion, we propose an augmented task arithmetic method that merges an additional language control model. It is trained to generate the correct target language token following the instructions. Our experiments demonstrate that our proposed language control model can achieve language expansion by eliminating language confusion. In our MuST-C and CoVoST-2 experiments, it shows up to 4.66 and 4.92 BLEU scores improvement, respectively. In addition, we demonstrate the use of our task arithmetic framework can expand to a language pair where neither paired ST training data nor a pre-trained ST model is available. We first synthesize the ST system from machine translation (MT) systems via task analogy, then merge the synthesized ST system to the existing ST model.


LoRA-Whisper: Parameter-Efficient and Extensible Multilingual ASR

arXiv.org Artificial Intelligence

When new languages need to be integrated into a multilingual ASR system, a naive Recent years have witnessed significant progress in multilingual approach is to fine-tune the ASR model using data from these automatic speech recognition (ASR), driven by the emergence new languages. Unfortunately, this often results in catastrophic of end-to-end (E2E) models and the scaling of multilingual forgetting, referring to the phenomenon that the recognition performance datasets. Despite that, two main challenges persist in multilingual of base languages tends to decline. To solve the above ASR: language interference and the incorporation of problem, Li et al. [26] proposes lifelong learning [27] solution new languages without degrading the performance of the existing which remedies the language interference problem by mixing ones. This paper proposes LoRA-Whisper, which incorporates base language data and new language data. However, this approach LoRA matrix into Whisper for multilingual ASR, is inefficient and time-consuming. Libera et al. [28] explores effectively mitigating language interference. Furthermore, by various continual learning methods [29-34] to address leveraging LoRA and the similarities between languages, we the issue of catastrophic forgetting. While these approaches can achieve better performance on new languages while upholding have helped alleviate the problem, it still persists.