NGPU-LM: GPU-Accelerated N-Gram Language Model for Context-Biasing in Greedy ASR Decoding

Bataev, Vladimir, Andrusenko, Andrei, Grigoryan, Lilit, Laptev, Aleksandr, Lavrukhin, Vitaly, Ginsburg, Boris

May-30-2025–arXiv.org Artificial Intelligence

Statistical n-gram language models are widely used for context-biasing tasks in Automatic Speech Recognition (ASR). However, existing implementations lack computational efficiency due to poor parallelization, making context-biasing less appealing for industrial use. This work rethinks data structures for statistical n-gram language models to enable fast and parallel operations for GPU-optimized inference. Our approach, named NGPU-LM, introduces customizable greedy decoding for all major ASR model types - including transducers, attention encoder-decoder models, and CTC - with less than 7% computational overhead. The proposed approach can eliminate more than 50% of the accuracy gap between greedy and beam search for out-of-domain scenarios while avoiding significant slowdown caused by beam search. The implementation of the proposed NGPU-LM is open-sourced.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-30-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.47)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Speech > Speech Recognition (0.72)
  - Representation & Reasoning > Search (0.72)
  - Machine Learning > Neural Networks
    - Deep Learning (0.72)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found