N-best T5: Robust ASR Error Correction using Multiple Input Hypotheses and Constrained Decoding Space
Ma, Rao, Gales, Mark J. F., Knill, Kate M., Qian, Mengjie
–arXiv.org Artificial Intelligence
Error correction models form an important part of Automatic Speech Recognition (ASR) post-processing to improve the readability and quality of transcriptions. Most prior works use the 1-best ASR hypothesis as input and therefore can only perform correction by leveraging the context within one sentence. In this work, we propose a novel N-best T5 model for this task, which is fine-tuned from a T5 model and utilizes ASR N-best lists as model input. By transferring knowledge from the pre-trained language model and obtaining richer information from the ASR decoding space, the proposed approach outperforms a strong Conformer-Transducer baseline. Another issue with standard error correction is that the generation process is not well-guided. To address this a constrained decoding process, either based on the N-best list or an ASR lattice, is used which allows additional information to be propagated.
arXiv.org Artificial Intelligence
Oct-10-2023
- Country:
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Genre:
- Research Report (1.00)
- Technology:
- Information Technology
- Artificial Intelligence
- Machine Learning (1.00)
- Natural Language (1.00)
- Representation & Reasoning (0.96)
- Speech > Speech Recognition (1.00)
- Data Science > Data Quality
- Data Cleaning (0.88)
- Artificial Intelligence
- Information Technology