Open Source State-Of-the-Art Solution for Romanian Speech Recognition

Pirlogeanu, Gabriel, Georgescu, Alexandru-Lucian, Cucu, Horia

Nov-6-2025–arXiv.org Artificial Intelligence

Abstract--In this work, we present a new state-of-the-art Romanian Automatic Speech Recognition (ASR) system based on NVIDIA's FastConformer architecture--explored here for the first time in the context of Romanian. We train our model on a large corpus of, mostly, weakly supervised transcriptions, totaling over 2,600 hours of speech. Leveraging a hybrid decoder with both Connectionist T emporal Classification (CTC) and T oken-Duration Transducer (TDT) branches, we evaluate a range of decoding strategies including greedy, ALSD, and CTC beam search with a 6-gram token-level language model. Our system achieves state-of-the-art performance across all Romanian evaluation benchmarks, including read, spontaneous, and domain-specific speech, with up to 27% relative WER reduction compared to previous best-performing systems. In addition to improved transcription accuracy, our approach demonstrates practical decoding efficiency, making it suitable for both research and deployment in low-latency ASR applications. Automatic Speech Recognition (ASR) has undergone a paradigm shift over the past decade, driven by the rise of end-to-end architectures and the increasing availability of large-scale datasets. Models such as RNN-Transducer, Transformer-Transducer, wav2vec, Whisper, Conformer [1] have dramatically improved recognition accuracy across many languages. Most recently, Speech Large Language Models (SpeechLLMs) [2] have further advanced the field by integrating multimodal and multilingual supervision at unprecedented scale.

artificial intelligence, natural language, speech recognition, (14 more...)

arXiv.org Artificial Intelligence

Nov-6-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report > Promising Solution (0.50)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found