Listening, Imagining & Refining: A Heuristic Optimized ASR Correction Framework with LLMs

Liu, Yutong, Zhang, Ziyue, Huang, Cheng, Yu, Yongbin, Wang, Xiangxiang, Cai, Yuqing, Tashi, Nyima

arXiv.org Artificial Intelligence 

ABSTRACT Automatic Speech Recognition (ASR) systems remain prone to errors that affect downstream applications. In this paper, we propose LIR-ASR, a heuristic optimized iterative correction framework using LLMs, inspired by human auditory perception. LIR-ASR applies a "Listening-Imagining-Refining" strategy, generating phonetic variants and refining them in context. A heuristic optimization with finite state machine (FSM) is introduced to prevent the correction process from being trapped in local optima and rule-based constraints help maintain semantic fidelity. Experiments on both English and Chinese ASR outputs show that LIR-ASR achieves average reductions in CER/WER of up to 1.5 percentage points compared to baselines, demonstrating substantial accuracy gains in transcription.