AITopics | aed model

Collaborating Authors

aed model

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

TurboBias: Universal ASR Context-Biasing powered by GPU-accelerated Phrase-Boosting Tree

Andrusenko, Andrei, Bataev, Vladimir, Grigoryan, Lilit, Lavrukhin, Vitaly, Ginsburg, Boris

arXiv.org Artificial IntelligenceAug-13-2025

--Recognizing specific key phrases is an essential task for contextualized Automatic Speech Recognition (ASR). However, most existing context-biasing approaches have limitations associated with the necessity of additional model training, significantly slow down the decoding process, or constrain the choice of the ASR system type. This paper proposes a universal ASR context-biasing framework that supports all major types: CTC, Transducers, and Attention Encoder-Decoder models. The framework is based on a GPU-accelerated word boosting tree, which enables it to be used in shallow fusion mode for greedy and beam search decoding without noticeable speed degradation, even with a vast number of key phrases (up to 20K items). The obtained results showed high efficiency of the proposed method, surpassing the considered open-source context-biasing approaches in accuracy and decoding speed. Our context-biasing framework is open-sourced as a part of the NeMo toolkit. Modern end-to-end automatic speech recognition (ASR) systems, such as Connectionist Temporal Classification (CTC) [1], Recurrent Neural Transducer (RNN-T) [2], and Attention Encoder-Decoder (AED) [3], already achieve relatively high speech recognition accuracy in common data domains [4].

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2508.07014

Genre: Research Report (0.90)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Enhanced Hybrid Transducer and Attention Encoder Decoder with Text Data

Tang, Yun, Kim, Eesung, Apsingekar, Vijendra Raj

arXiv.org Artificial IntelligenceJun-25-2025

A joint speech and text optimization method is proposed for hybrid transducer and attention-based encoder decoder (TAED) modeling to leverage large amounts of text corpus and enhance ASR accuracy. The joint TAED (J-TAED) is trained with both speech and text input modalities together, while it only takes speech data as input during inference. The trained model can unify the internal representations from different modalities, and be further extended to text-based domain adaptation. It can effectively alleviate data scarcity for mismatch domain tasks since no speech data is required. Our experiments show J-TAED successfully integrates speech and linguistic information into one model, and reduce the WER by 5.8 ~12.8% on the Librispeech dataset. The model is also evaluated on two out-of-domain datasets: one is finance and another is named entity focused. The text-based domain adaptation brings 15.3% and 17.8% WER reduction on those two datasets respectively.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.19159

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LAMBO: Large AI Model Empowered Edge Intelligence

Dong, Li, Jiang, Feibo, Peng, Yubo, Wang, Kezhi, Yang, Kun, Pan, Cunhua, Schober, Robert

arXiv.org Artificial IntelligenceAug-3-2024

Next-generation edge intelligence is anticipated to benefit various applications via offloading techniques. However, traditional offloading architectures face several issues, including heterogeneous constraints, partial perception, uncertain generalization, and lack of tractability. In this paper, we propose a Large AI Model-Based Offloading (LAMBO) framework with over one billion parameters for solving these problems. We first use input embedding (IE) to achieve normalized feature representation with heterogeneous constraints and task prompts. Then, we introduce a novel asymmetric encoder-decoder (AED) as the decision-making model, which is an improved transformer architecture consisting of a deep encoder and a shallow decoder for global perception and decision. Next, actor-critic learning (ACL) is used to pre-train the AED for different optimization tasks under corresponding prompts, enhancing the AED's generalization in multi-task scenarios. Finally, we propose an active learning from expert feedback (ALEF) method to fine-tune the decoder of the AED for tracking changes in dynamic environments. Our simulation results validate the advantages of the proposed LAMBO framework.

architecture, lam, mec system, (12 more...)

arXiv.org Artificial Intelligence

2308.15078

Country:

Asia > China (0.05)
Europe > United Kingdom > England > Greater London > London (0.04)
Europe > United Kingdom > England > Essex (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)

Genre: Research Report (0.50)

Industry: Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications > Networks (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

Chunked Attention-based Encoder-Decoder Model for Streaming Speech Recognition

Zeineldeen, Mohammad, Zeyer, Albert, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine LearningSep-15-2023

We study a streamable attention-based encoder-decoder model in which either the decoder, or both the encoder and decoder, operate on pre-defined, fixed-size windows called chunks. A special end-of-chunk (EOC) symbol advances from one chunk to the next chunk, effectively replacing the conventional end-of-sequence symbol. This modification, while minor, situates our model as equivalent to a transducer model that operates on chunks instead of frames, where EOC corresponds to the blank symbol. We further explore the remaining differences between a standard transducer and our model. Additionally, we examine relevant aspects such as long-form speech generalization, beam size, and length normalization. Through experiments on Librispeech and TED-LIUM-v2, and by concatenating consecutive sequences for long-form trials, we find that our streamable model maintains competitive performance compared to the non-streamable variant and generalizes very well to long-form speech.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

2309.08436

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Germany (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.69)

Add feedback

Hybrid Transducer and Attention based Encoder-Decoder Modeling for Speech-to-Text Tasks

Tang, Yun, Sun, Anna Y., Inaguma, Hirofumi, Chen, Xinyue, Dong, Ning, Ma, Xutai, Tomasello, Paden D., Pino, Juan

arXiv.org Artificial IntelligenceMay-4-2023

Transducer and Attention based Encoder-Decoder (AED) are two widely used frameworks for speech-to-text tasks. They are designed for different purposes and each has its own benefits and drawbacks for speech-to-text tasks. In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks. The new method leverages AED's strength in non-monotonic sequence to sequence learning while retaining Transducer's streaming property. In the proposed framework, Transducer and AED share the same speech encoder. The predictor in Transducer is replaced by the decoder in the AED model, and the outputs of the decoder are conditioned on the speech inputs instead of outputs from an unconditioned language model. The proposed solution ensures that the model is optimized by covering all possible read/write scenarios and creates a matched environment for streaming applications. We evaluate the proposed approach on the \textsc{MuST-C} dataset and the findings demonstrate that TAED performs significantly better than Transducer for offline automatic speech recognition (ASR) and speech-to-text translation (ST) tasks. In the streaming case, TAED outperforms Transducer in the ASR task and one ST direction while comparable results are achieved in another translation direction.

artificial intelligence, speech recognition, transducer, (15 more...)

arXiv.org Artificial Intelligence

2305.03101

Country: South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)

Add feedback

RED-ACE: Robust Error Detection for ASR using Confidence Embeddings

Gekhman, Zorik, Zverinski, Dina, Mallinson, Jonathan, Beryozkin, Genady

arXiv.org Artificial IntelligenceOct-26-2022

ASR Error Detection (AED) models aim to post-process the output of Automatic Speech Recognition (ASR) systems, in order to detect transcription errors. Modern approaches usually use text-based input, comprised solely of the ASR transcription hypothesis, disregarding additional signals from the ASR model. Instead, we propose to utilize the ASR system's word-level confidence scores for improving AED performance. Specifically, we add an ASR Confidence Embedding (ACE) layer to the AED model's encoder, allowing us to jointly encode the confidence scores and the transcribed text into a contextualized representation. Our experiments show the benefits of ASR confidence scores for AED, their complementary effect over the textual signal, as well as the effectiveness and robustness of ACE for combining these signals. To foster further research, we publish a novel AED dataset consisting of ASR outputs on the LibriSpeech corpus with annotated transcription errors.

confidence score, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2203.07172

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Canada > Ontario > Toronto (0.04)
Oceania > Australia > Queensland > Brisbane (0.04)
(8 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.93)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

Investigating Methods to Improve Language Model Integration for Attention-based Encoder-Decoder ASR Models

Zeineldeen, Mohammad, Glushko, Aleksandr, Michel, Wilfried, Zeyer, Albert, Schlüter, Ralf, Ney, Hermann

arXiv.org Machine LearningApr-12-2021

Attention-based encoder-decoder (AED) models learn an implicit internal language model (ILM) from the training transcriptions. The integration with an external LM trained on much more unpaired text usually leads to better performance. A Bayesian interpretation as in the hybrid autoregressive transducer (HAT) suggests dividing by the prior of the discriminative acoustic model, which corresponds to this implicit LM, similarly as in the hybrid hidden Markov model approach. The implicit LM cannot be calculated efficiently in general and it is yet unclear what are the best methods to estimate it. In this work, we compare different approaches from the literature and propose several novel methods to estimate the ILM directly from the AED model. Our proposed methods outperform all previous approaches. We also investigate other methods to suppress the ILM mainly by decreasing the capacity of the AED model, limiting the label context, and also by training the AED model together with a pre-existing LM.

aed model, decoder, speech recognition, (12 more...)

arXiv.org Machine Learning

2104.05544

Country:

Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
Europe > Iceland > Capital Region > Reykjavik (0.04)
(3 more...)

Genre: Research Report (0.84)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.82)

Add feedback

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

Garg, Abhinav, Gowda, Dhananjaya, Kumar, Ankur, Kim, Kwangyoun, Kumar, Mehul, Kim, Chanwoo

arXiv.org Machine LearningDec-27-2019

IMPROVED MUL TI-ST AGE TRAINING OF ONLINE A TTENTION-BASED ENCODER-DECODER MODELS Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim Speech Processing Lab, AI Center, Samsung Research, Korea ABSTRACT In this paper, we propose a refined multistage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM). Index T erms-- Attention based encoder-decoder models, online attention, multistage training, multi-task learning 1. INTRODUCTION Recently, attention-based encoder-decoder (AED) models have gained popularity for developing end-to-end neural network based automatic speech recognition (ASR) systems [1, 2, 3]. One of the primary advantages of AED models is that the language information is tightly coupled into the decoder, obviating the need for an external language model (LM). AED models have been shown to perform better than other end-to-end models, namely, connectionist temporal classification (CTC) and recurrent neural network transducer (RNN-T) models [4].

character encoder, encoder, ulstm layer, (15 more...)

arXiv.org Machine Learning

1912.12384

Country:

North America > Canada > Quebec > Montreal (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback