Exploring Effective Distillation of Self-Supervised Speech Models for Automatic Speech Recognition

Wang, Yujin, Tang, Changli, Ma, Ziyang, Zheng, Zhisheng, Chen, Xie, Zhang, Wei-Qiang

arXiv.org Artificial Intelligence 

Inspired by compression works [9, 10] on BERT model in NLP domain, there are several previous studies on the distillation Self-supervised learning (SSL) has achieved great success of SSL models in speech domain [11, 12, 13, 14], in speech processing, but always with a large model size to which attempts to reduce the model size for a well-trained increase the modeling capacity. This may limit its potential SSL model in an unsupervised fashion. Most of these existing applications due to the expensive computation and memory works are investigated on the SUPERB benchmark [15], a costs introduced by the oversize model. Compression for SSL generic testing framework for pre-trained models on a range models has become an important research direction of practical of downstream tasks. The SSL models are evaluated in the value. To this end, we explore the effective distillation constrained track, where the whole upstream model is frozen, of HuBERT-based SSL models for automatic speech recognition.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found