CopyNE: Better Contextual ASR by Copying Named Entities
Zhou, Shilin, Li, Zhenghua, Hong, Yu, Zhang, Min, Wang, Zhefeng, Huai, Baoxing
–arXiv.org Artificial Intelligence
Recent years have seen remarkable progress in automatic speech recognition (ASR). However, traditional token-level ASR models have struggled with accurately transcribing entities due to the problem of homophonic and near-homophonic tokens. This paper introduces a novel approach called CopyNE, which uses a span-level copying mechanism to improve ASR in transcribing entities. CopyNE can copy all tokens of an entity at once, effectively avoiding errors caused by homophonic or near-homophonic tokens that occur when predicting multiple tokens separately. Experiments on Aishell and ST-cmds datasets demonstrate that CopyNE achieves significant reductions in character error rate (CER) and named entity CER (NE-CER), especially in entity-rich scenarios. Furthermore, even when compared to the strong Whisper baseline, CopyNE still achieves notable reductions in CER and NE-CER. Qualitative comparisons with previous approaches demonstrate that CopyNE can better handle entities, effectively improving the accuracy of ASR.
arXiv.org Artificial Intelligence
May-22-2023
- Country:
- North America > United States > Minnesota (0.28)
- Genre:
- Research Report > New Finding (0.46)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Natural Language > Text Processing (1.00)
- Speech > Speech Recognition (1.00)
- Information Technology > Artificial Intelligence