The Practice of Speech and Language Processing in China
Although great progress has been made in automatic speech recognition (ASR), significant performance degradation still exists in very noisy environments. Over the past few years, Chinese startup AISpeech has been developing very deep convolutional neural networks (VDCNN),21 a new architecture the company recently began applying to ASR use cases. Different than traditional deep CNN models for computer vision, VDCNN features novel filter designs, pooling operations, input feature map selection, and padding strategies, all of which lead to more accurate and robust ASR performance. Moreover, VDCNN is further extended with adaptation, which can significantly alleviate the mismatch between training and testing. Factor-aware training and cluster-adaptive training are explored to fully utilize the environmental variety and quickly adapt model parameters.
Oct-26-2021, 05:11:49 GMT
- AI-Alerts:
- 2021 > 2021-10 > AAAI AI-Alert for Oct 26, 2021 (1.00)
- Industry:
- Information Technology (0.46)
- Technology: