FlexCTC: GPU-powered CTC Beam Decoding With Advanced Contextual Abilities

Grigoryan, Lilit, Bataev, Vladimir, Karpov, Nikolay, Andrusenko, Andrei, Lavrukhin, Vitaly, Ginsburg, Boris

Aug-14-2025–arXiv.org Artificial Intelligence

--While beam search improves speech recognition quality over greedy decoding, standard implementations are slow, often sequential, and CPU-bound. T o fully leverage modern hardware capabilities, we present a novel open-source Flex-CTC toolkit for fully GPU-based beam decoding, designed for Connectionist T emporal Classification (CTC) models. Developed entirely in Python and PyT orch, it offers a fast, user-friendly, and extensible alternative to traditional C++, CUDA, or WFST -based decoders. The toolkit features a high-performance, fully batched GPU implementation with eliminated CPU-GPU synchronization and minimized kernel launch overhead via CUDA Graphs. It also supports advanced contextualization techniques, including GPU-powered N-gram language model fusion and phrase-level boosting. These features enable accurate and efficient decoding, making them suitable for both research and production use. Advancements in GPU hardware and deep learning architectures have facilitated the full parallelization of many components in automatic speech recognition (ASR) systems on GPUs. Modern ASR encoder architectures - such as transformers [1], [2] and Conformers [3] - are explicitly engineered to leverage this parallelism by enabling simultaneous computation across audio sequences, thereby maximizing GPU utilization and throughput.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Aug-14-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Speech > Speech Recognition (0.77)
  - Machine Learning > Neural Networks
    - Deep Learning (0.89)