AITopics | internal lm

Collaborating Authors

internal lm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Decoupled Structure for Improved Adaptability of End-to-End Models

Deng, Keqi, Woodland, Philip C.

arXiv.org Artificial IntelligenceAug-25-2023

Although end-to-end (E2E) trainable automatic speech recognition (ASR) has shown great success by jointly learning acoustic and linguistic information, it still suffers from the effect of domain shifts, thus limiting potential applications. The E2E ASR model implicitly learns an internal language model (LM) which characterises the training distribution of the source domain, and the E2E trainable nature makes the internal LM difficult to adapt to the target domain with text-only data To solve this problem, this paper proposes decoupled structures for attention-based encoder-decoder (Decoupled-AED) and neural transducer (Decoupled-Transducer) models, which can achieve flexible domain adaptation in both offline and online scenarios while maintaining robust intra-domain performance. To this end, the acoustic and linguistic parts of the E2E model decoder (or prediction network) are decoupled, making the linguistic component (i.e. internal LM) replaceable. When encountering a domain shift, the internal LM can be directly replaced during inference by a target-domain LM, without re-training or using domain-specific paired speech-text data. Experiments for E2E ASR models trained on the LibriSpeech-100h corpus showed that the proposed decoupled structure gave 15.1% and 17.2% relative word error rate reductions on the TED-LIUM 2 and AESRC2020 corpora while still maintaining performance on intra-domain data.

internal lm, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2308.13345

Country:

Asia > Singapore (0.05)
South America > Colombia > Bolivar Department > Cartagena (0.04)
Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
(27 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Modular Hybrid Autoregressive Transducer

Meng, Zhong, Chen, Tongzhou, Prabhavalkar, Rohit, Zhang, Yu, Wang, Gary, Audhkhasi, Kartik, Emond, Jesse, Strohman, Trevor, Ramabhadran, Bhuvana, Huang, W. Ronny, Variani, Ehsan, Huang, Yinghui, Moreno, Pedro J.

arXiv.org Artificial IntelligenceFeb-16-2023

Text-only adaptation of a transducer model remains challenging for end-to-end speech recognition since the transducer has no clearly separated acoustic model (AM), language model (LM) or blank model. In this work, we propose a modular hybrid autoregressive transducer (MHAT) that has structurally separated label and blank decoders to predict label and blank distributions, respectively, along with a shared acoustic encoder. The encoder and label decoder outputs are directly projected to AM and internal LM scores and then added to compute label posteriors. We train MHAT with an internal LM loss and a HAT loss to ensure that its internal LM becomes a standalone neural LM that can be effectively adapted to text. Moreover, text adaptation of MHAT fosters a much better LM fusion than internal LM subtraction-based methods. On Google's large-scale production data, a multi-domain MHAT adapted with 100B sentences achieves relative WER reductions of up to 12.4% without LM fusion and 21.5% with LM fusion from 400K-hour trained HAT.

machine learning, mhat, natural language, (17 more...)

arXiv.org Artificial Intelligence

2210.17049

Country: North America > United States (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Internal language model estimation through explicit context vector learning for attention-based encoder-decoder ASR

Liu, Yufei, Ma, Rao, Xu, Haihua, He, Yi, Ma, Zejun, Zhang, Weibin

arXiv.org Artificial IntelligenceJan-26-2022

An end-to-end (E2E) speech recognition model implicitly learns a biased internal language model (ILM) during training. To fused an external LM during inference, the scores produced by the biased ILM need to be estimated and subtracted. In this paper we propose two novel approaches to estimate the biased ILM based on Listen-Attend-Spell (LAS) models. The simpler method is to replace the context vector of the LAS decoder at every time step with a learnable vector. The other more advanced method is to use a simple feed-forward network to directly map query vectors to context vectors, making the generation of the context vectors independent of the LAS encoder. Both the learnable vector and the mapping network are trained on the transcriptions of the training data to minimize the perplexity while all the other parameters of the LAS model is fixed. Experiments show that the ILMs estimated by the proposed methods achieve the lowest perplexity. In addition, they also significantly outperform the shallow fusion method and two previously proposed Internal Language Model Estimation (ILME) approaches on multiple datasets.

context vector, decoder, encoder, (11 more...)

arXiv.org Artificial Intelligence

2201.11627

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Singapore (0.04)
Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.81)

Add feedback

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

Meng, Zhong, Gaur, Yashesh, Kanda, Naoyuki, Li, Jinyu, Chen, Xie, Wu, Yu, Gong, Yifan

arXiv.org Artificial IntelligenceOct-14-2021

Text-only adaptation of an end-to-end (E2E) model remains a challenging task for automatic speech recognition (ASR). Language model (LM) fusion-based approaches require an additional external LM during inference, significantly increasing the computation cost. To overcome this, we propose an internal LM adaptation (ILMA) of the E2E model using text-only data. Trained with audio-transcript pairs, an E2E model implicitly learns an internal LM that characterizes the token sequence probability which is approximated by the E2E model output after zeroing out the encoder contribution. During ILMA, we fine-tune the internal LM, i.e., the E2E components excluding the encoder, to minimize a cross-entropy loss. To make ILMA effective, it is essential to train the E2E model with an internal LM loss besides the standard E2E loss. Furthermore, we propose to regularize ILMA by minimizing the Kullback-Leibler divergence between the output distributions of the adapted and unadapted internal LMs. ILMA is the most effective when we update only the last linear layer of the joint network. ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost. Experimented with 30K-hour trained transformer transducer models, ILMA achieves up to 34.9% relative word error rate reduction from the unadapted baseline.

ilma, internal lm, text-only data, (11 more...)

arXiv.org Artificial Intelligence

2110.05354

Country: North America > United States > Washington > King County > Redmond (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

Zeineldeen, Mohammad, Glushko, Aleksandr, Michel, Wilfried, Zeyer, Albert, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine LearningApr-12-2021

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.

aed model, decoder, speech recognition, (12 more...)

arXiv.org Machine Learning

2104.05544

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
(3 more...)

Genre: Research Report (0.84)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.82)

Add feedback

Librispeech Transducer Model with Internal Language Model Prior Correction

Zeyer, Albert, Merboldt, André, Michel, Wilfried, Schlüter, Ralf, Ney, Hermann

arXiv.org Artificial IntelligenceApr-7-2021

We present our transducer model on Librispeech. We study variants to include an external language model (LM) with shallow fusion and subtract an estimated internal LM. This is justified by a Bayesian interpretation where the transducer model prior is given by the estimated internal LM. The subtraction of the internal LM gives us over 14% relative improvement over normal shallow fusion. Our transducer has a separate probability distribution for the non-blank labels which allows for easier combination with the external LM, and easier estimation of the internal LM. We additionally take care of including the end-of-sentence (EOS) probability of the external LM in the last blank probability which further improves the performance. All our code and setups are published.

shallow fusion, speech recognition, transducer, (11 more...)

arXiv.org Artificial Intelligence

2104.03006

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.05)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(6 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.99)

Add feedback