AITopics | contextual adapter

Collaborating Authors

contextual adapter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Effective Context-Balanced Adaptation Approach for Long-Tailed Speech Recognition

Wang, Yi-Cheng, Pai, Li-Ting, Yan, Bi-Cheng, Wang, Hsin-Wei, Lin, Chi-Han, Chen, Berlin

arXiv.org Artificial IntelligenceSep-10-2024

End-to-end (E2E) automatic speech recognition (ASR) models have become standard practice for various commercial applications. However, in real-world scenarios, the long-tailed nature of word distribution often leads E2E ASR models to perform well on common words but fall short in recognizing uncommon ones. Recently, the notion of a contextual adapter (CA) was proposed to infuse external knowledge represented by a context word list into E2E ASR models. Although CA can improve recognition performance on rare words, two crucial data imbalance problems remain. First, when using low-frequency words as context words during training, since these words rarely occur in the utterance, CA becomes prone to overfit on attending to the token due to higher-frequency words not being present in the context list. Second, the long-tailed distribution within the context list itself still causes the model to perform poorly on low-frequency context words. In light of this, we explore in-depth the impact of altering the context list to have words with different frequency distributions on model performance, and meanwhile extend CA with a simple yet effective context-balanced learning objective. A series of experiments conducted on the AISHELL-1 benchmark dataset suggests that using all vocabulary words from the training corpus as the context list and pairing them with our balanced objective yields the best performance, demonstrating a significant reduction in character error rate (CER) by up to 1.21% and a more pronounced 9.44% reduction in the error rate of zero-shot words.

adapter, context list, context word, (16 more...)

arXiv.org Artificial Intelligence

2409.06468

Country: Asia > Taiwan > Taiwan Province > Taipei (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improving ASR Contextual Biasing with Guided Attention

Tang, Jiyang, Kim, Kwangyoun, Shon, Suwon, Wu, Felix, Sridhar, Prashant, Watanabe, Shinji

arXiv.org Artificial IntelligenceJan-16-2024

In this paper, we propose a Guided Attention (GA) auxiliary training loss, which improves the effectiveness and robustness of automatic speech recognition (ASR) contextual biasing without introducing additional parameters. A common challenge in previous literature is that the word error rate (WER) reduction brought by contextual biasing diminishes as the number of bias phrases increases. To address this challenge, we employ a GA loss as an additional training objective besides the Transducer loss. The proposed GA loss aims to teach the cross attention how to align bias phrases with text tokens or audio frames. Compared to studies with similar motivations, the proposed loss operates directly on the cross attention weights and is easier to implement. Through extensive experiments based on Conformer Transducer with Contextual Adapter, we demonstrate that the proposed method not only leads to a lower WER but also retains its effectiveness as the number of bias phrases increases. Specifically, the GA loss decreases the WER of rare vocabularies by up to 19.2% on LibriSpeech compared to the contextual biasing baseline, and up to 49.3% compared to a vanilla Transducer.

bias phrase, contextual, contextual adapter, (13 more...)

arXiv.org Artificial Intelligence

2401.08835

Country:

South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.89)

Add feedback

Multilingual Contextual Adapters To Improve Custom Word Recognition In Low-resource Languages

Kulshreshtha, Devang, Dingliwal, Saket, Houston, Brady, Bodapati, Sravan

arXiv.org Artificial IntelligenceJul-3-2023

Connectionist Temporal Classification (CTC) models are popular for their balance between speed and performance for Automatic Speech Recognition (ASR). However, these CTC models still struggle in other areas, such as personalization towards custom words. A recent approach explores Contextual Adapters, wherein an attention-based biasing model for CTC is used to improve the recognition of custom entities. While this approach works well with enough data, we showcase that it isn't an effective strategy for low-resource languages. In this work, we propose a supervision loss for smoother training of the Contextual Adapters. Further, we explore a multilingual strategy to improve performance with limited training data. Our method achieves 48% F1 improvement in retrieving unseen custom entities for a low-resource language. Interestingly, as a by-product of training the Contextual Adapters, we see a 5-11% Word Error Rate (WER) reduction in the performance of the base CTC model as well.

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2307.00759

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

Add feedback

Dialog act guided contextual adapter for personalized speech recognition

Chang, Feng-Ju, Muniyappa, Thejaswi, Sathyendra, Kanthashree Mysore, Wei, Kai, Strimel, Grant P., McGowan, Ross

arXiv.org Artificial IntelligenceMar-31-2023

Personalization in multi-turn dialogs has been a long standing challenge for end-to-end automatic speech recognition (E2E ASR) models. Recent work on contextual adapters has tackled rare word recognition using user catalogs. This adaptation, however, does not incorporate an important cue, the dialog act, which is available in a multi-turn dialog scenario. In this work, we propose a dialog act guided contextual adapter network. Specifically, it leverages dialog acts to select the most relevant user catalogs and creates queries based on both -- the audio as well as the semantic relationship between the carrier phrase and user catalogs to better guide the contextual biasing. On industrial voice assistant datasets, our model outperforms both the baselines - dialog act encoder-only model, and the contextual adaptation, leading to the most improvement over the no-context model: 58% average relative word error rate reduction (WERR) in the multi-turn dialog scenario, in comparison to the prior-art contextual adapter, which has achieved 39% WERR over the no-context model.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2303.17799

Country: North America > United States (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Towards Personalization of CTC Speech Recognition Models with Contextual Adapters and Adaptive Boosting

Dingliwal, Saket, Sunkara, Monica, Bodapati, Sravan, Ronanki, Srikanth, Farris, Jeff, Kirchhoff, Katrin

arXiv.org Artificial IntelligenceNov-13-2022

End-to-end speech recognition models trained using joint Connectionist Temporal Classification (CTC)-Attention loss have gained popularity recently. In these models, a non-autoregressive CTC decoder is often used at inference time due to its speed and simplicity. However, such models are hard to personalize because of their conditional independence assumption that prevents output tokens from previous time steps to influence future predictions. To tackle this, we propose a novel two-way approach that first biases the encoder with attention over a predefined list of rare long-tail and out-of-vocabulary (OOV) words and then uses dynamic boosting and phone alignment network during decoding to further bias the subword predictions. We evaluate our approach on open-source VoxPopuli and in-house medical datasets to showcase a 60% improvement in F1 score on domain-specific rare words over a strong CTC baseline.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2210.0951

Country: Europe (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)

Add feedback