Whisfusion: Parallel ASR Decoding via a Diffusion Transformer
Kwon, Taeyoun, Ahn, Junhyuk, Yun, Taegeun, Jwa, Heeju, Choi, Yoonchae, Park, Siwon, Kim, Nam-Joon, Kim, Jangchan, Ryu, Hyun Gon, Lee, Hyuk-Jae
–arXiv.org Artificial Intelligence
Fast Automatic Speech Recognition (ASR) is critical for latency-sensitive applications such as real-time captioning and meeting transcription. However, truly parallel ASR decoding remains challenging due to the sequential nature of autoregressive (AR) decoders and the context limitations of non-autoregressive (NAR) methods. While modern ASR encoders can process up to 30 seconds of audio at once, AR decoders still generate tokens sequentially, creating a latency bottleneck. We propose Whisfusion, the first framework to fuse a pre-trained Whisper encoder with a text diffusion decoder. This NAR architecture resolves the AR latency bottleneck by processing the entire acoustic context in parallel at every decoding step. A lightweight cross-attention adapter trained via parameter-efficient fine-tuning (PEFT) bridges the two modalities. We also introduce a batch-parallel, multi-step decoding strategy that improves accuracy by increasing the number of candidates with minimal impact on speed. Fine-tuned solely on LibriSpeech (960h), Whisfusion achieves a lower WER than Whisper-tiny (8.3% vs. 9.7%), and offers comparable latency on short audio. For longer utterances (>20s), it is up to 2.6x faster than the AR baseline, establishing a new, efficient operating point for long-form ASR. The implementation and training scripts are available at https://github.com/taeyoun811/Whisfusion.
arXiv.org Artificial Intelligence
Aug-12-2025
- Country:
- Asia
- Europe
- Austria > Vienna (0.14)
- Czechia > South Moravian Region
- Brno (0.04)
- United Kingdom > England
- Greater Manchester > Manchester (0.04)
- North America
- Canada
- Alberta > Census Division No. 6
- Calgary Metropolitan Region > Calgary (0.04)
- British Columbia > Vancouver (0.04)
- Alberta > Census Division No. 6
- Mexico > Mexico City
- Mexico City (0.04)
- United States
- California > Los Angeles County
- Long Beach (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- New Mexico > Bernalillo County
- Albuquerque (0.04)
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- California > Los Angeles County
- Canada
- Oceania > Australia
- Queensland > Brisbane (0.04)
- Genre:
- Research Report (0.82)
- Technology: