Distinguishing Repetition Disfluency from Morphological Reduplication in Bangla ASR Transcripts: A Novel Corpus and Benchmarking Analysis

Arpa, Zaara Zabeen, Apurbo, Sadnam Sakib, Oishee, Nazia Karim Khan, Abrar, Ajwad

Nov-18-2025–arXiv.org Artificial Intelligence

Automatic Speech Recognition (ASR) is now integral to digital interaction, driving applications from virtual assistants to automated subtitles on platforms like YouTube ( Dykes et al. 2023). Despite its wide adoption and significant advancements, ASR performance remains imperfect, particularly in Large Vocabulary Continuous Speech Recognition (L VCSR) and under real-world conditions like background noise or diverse accents ( Errattahi et al. 2018; Romana et al. 2024). This often results in a significant Word Error Rate (WER). A major, persistent source of error stems from speech disfluencies, which are interruptions in the smooth flow of speech, including filled pauses, hesitations, self-corrections, and most relevantly, repetitions ( Romana et al. 2024). Disfluencies are natural and frequent; one study found a 50% probability of a disfluency in a 10-13 word sentence ( Jamshid Lou and Johnson 2020b). Their presence creates noisy transcripts that are difficult to read and detrimental to downstream Natural Language Processing (NLP) tasks such as machine translation or information extraction ( Romana et al. 2024; Jamshid Lou and Johnson 2020a).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-18-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Asia > Middle East
  - UAE (0.28)

Genre:
- Research Report (0.84)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning
    - Neural Networks > Deep Learning (0.48)
    - Performance Analysis > Accuracy (0.48)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found