Weight-Inherited Distillation for Task-Agnostic BERT Compression
Wu, Taiqiang, Hou, Cheng, Zhao, Zhe, Lao, Shanshan, Li, Jiayi, Wong, Ngai, Yang, Yujiu
–arXiv.org Artificial Intelligence
Knowledge Distillation (KD) is a predominant approach for BERT compression. Previous KD-based methods focus on designing extra alignment losses for the student model to mimic the behavior of the teacher model. These methods transfer the knowledge in an indirect way. In this paper, we propose a novel Weight-Inherited Distillation (WID), which directly transfers knowledge from the teacher. WID does not require any additional alignment loss and trains a compact student by inheriting the weights, showing a new perspective of knowledge distillation. Specifically, we design the row compactors and column compactors as mappings and then compress the weights via structural re-parameterization. Experimental results on the GLUE and SQuAD benchmarks show that WID outperforms previous state-of-the-art KD-based baselines. Further analysis indicates that WID can also learn the attention patterns from the teacher model without any alignment loss on attention distributions.
arXiv.org Artificial Intelligence
May-15-2023
- Country:
- South America > Chile
- North America
- Dominican Republic (0.04)
- United States
- Texas (0.04)
- Washington > King County
- Seattle (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- Maryland > Montgomery County
- Gaithersburg (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Colorado > Denver County
- Denver (0.04)
- California > Los Angeles County
- Long Beach (0.04)
- Canada
- Quebec > Montreal (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Austria (0.04)
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Asia
- South Korea (0.04)
- China
- Hong Kong (0.04)
- Guangdong Province > Shenzhen (0.04)
- Genre:
- Research Report (0.82)
- Industry:
- Education (0.49)
- Technology: