CAT: CRF-based ASR Toolkit
An, Keyu, Xiang, Hongyu, Ou, Zhijian
ABSTRACT In this paper, we present a new open source toolkit for automatic speech recognition (ASR), named CA T (CRF-based ASR Toolkit). A key feature of CA T is discriminative training in the framework of conditional random field (CRF), particularly with connectionist temporal classification (CTC) inspired state topology. CA T contains a full-fledged implementation of CTC-CRF and provides a complete workflow for CRF-based end-to-end speech recognition. Evaluation results on Chinese and English benchmarks such as Switchboard and Aishell show that CA T obtains the state-of-the-art results among existing end-to-end models with less parameters, and is competitive compared with the hybrid DNN-HMM models. Towards flexibility, we show that i-vector based speaker-adapted recognition and latency control mechanism can be explored easily and effectively in CA T. We hope CA T, especially the CRF-based framework and software, will be of broad interest to the community, and can be further explored and improved. Index T erms-- speech recognition, open source toolkit, conditional random field, end-to-end 1. INTRODUCTION In addition to theories and algorithms, open source toolkits make substantial contributions to automatic speech recognition (ASR) technologies.
Nov-20-2019