Accurate Knowledge Distillation with n-best Reranking
–arXiv.org Artificial Intelligence
We propose utilizing n-best reranking to enhance the Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we explore hypotheses beyond the top-1 to acquire more accurate pseudo-labels. To accomplish this, we leverage a diverse set of models with different inductive biases, objective functions or architectures, including publicly-available large pretrained models. The effectiveness of our proposal is validated through experiments on the WMT'21 German-English and Chinese-English translation tasks. Our results demonstrate that utilizing the pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model from (Tran et al., 2021) with 4.7 billion parameters, while having two orders of magnitude fewer parameters.
arXiv.org Artificial Intelligence
Nov-14-2023
- Country:
- Asia
- China > Hong Kong (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Middle East > UAE
- Abu Dhabi Emirate > Abu Dhabi (0.04)
- Europe
- North America
- Canada
- Dominican Republic (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Maryland > Baltimore (0.04)
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Suffolk County > Boston (0.04)
- New York > New York County
- New York City (0.04)
- Pennsylvania (0.04)
- Texas (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Diego County
- Asia
- Genre:
- Research Report > New Finding (0.86)
- Industry:
- Education (0.57)
- Technology: