Accurate Knowledge Distillation with n-best Reranking

Nov-14-2023–arXiv.org Artificial Intelligence

We propose utilizing n-best reranking to enhance the Sequence-Level Knowledge Distillation (Kim and Rush, 2016) where we explore hypotheses beyond the top-1 to acquire more accurate pseudo-labels. To accomplish this, we leverage a diverse set of models with different inductive biases, objective functions or architectures, including publicly-available large pretrained models. The effectiveness of our proposal is validated through experiments on the WMT'21 German-English and Chinese-English translation tasks. Our results demonstrate that utilizing the pseudo-labels generated by our n-best reranker leads to a significantly more accurate student model. In fact, our best student model achieves comparable accuracy to a large translation model from (Tran et al., 2021) with 4.7 billion parameters, while having two orders of magnitude fewer parameters.

computational linguistic, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Nov-14-2023

arXiv.org PDF

Add feedback

Country:
- Asia (0.93)
- Europe (1.00)
- North America > United States
  - Massachusetts (0.28)

Genre:
- Research Report > New Finding (0.86)

Industry:
- Education (0.57)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Natural Language > Machine Translation (1.00)