CR-CTC: Consistency regularization on CTC for improved speech recognition

Yao, Zengwei, Kang, Wei, Yang, Xiaoyu, Kuang, Fangjun, Guo, Liyong, Zhu, Han, Jin, Zengrui, Li, Zhaoqing, Lin, Long, Povey, Daniel

Dec-8-2024–arXiv.org Artificial Intelligence

Connectionist Temporal Classification (CTC) is a widely used method for automatic speech recognition (ASR), renowned for its simplicity and computational efficiency. However, it often falls short in recognition performance. In this work, we propose the Consistency-Regularized CTC (CR-CTC), which enforces consistency between two CTC distributions obtained from different augmented views of the input speech mel-spectrogram. We provide in-depth insights into its essential behaviors from three perspectives: 1) it conducts self-distillation between random pairs of sub-models that process different augmented views; 2) it learns contextual representation through masked prediction for positions within time-masked regions, especially when we increase the amount of time masking; 3) it suppresses the extremely peaky CTC distributions, thereby reducing overfitting and improving the generalization ability. Extensive experiments on LibriSpeech, Aishell-1, and GigaSpeech datasets demonstrate the effectiveness of our CR-CTC. It significantly improves the CTC performance, achieving state-of-the-art results comparable to those attained by transducer or systems combining CTC and attention-based encoder-decoder (CTC/AED). We release our code at \url{https://github.com/k2-fsa/icefall}.

artificial intelligence, cr-ctc, machine learning, (14 more...)

arXiv.org Artificial Intelligence

Dec-8-2024

arXiv.org PDF

Add feedback

Country:
- South America > Chile
  - Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- Europe > Netherlands
  - North Holland > Amsterdam (0.04)
- Asia > China
  - Beijing > Beijing (0.04)

Genre:
- Research Report (0.64)

Industry:
- Education (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech > Speech Recognition (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.46)