Knowledge Distillation for Closed-Source Language Models

Chen, Hongzhan, Quan, Xiaojun, Chen, Hehong, Yan, Ming, Zhang, Ji

Jan-13-2024–arXiv.org Artificial Intelligence

Closed-source language models such as GPT-4 have achieved remarkable performance. Many recent studies focus on enhancing the capabilities of smaller models through knowledge distillation from closed-source language models. However, due to the incapability to directly access the weights, hidden states, and output distributions of these closed-source models, the distillation can only be performed by fine-tuning smaller models with data samples generated by closed-source language models, which constrains the effectiveness of knowledge distillation. In this paper, we propose to estimate the output distributions of closed-source language models within a Bayesian estimation framework, involving both prior and posterior estimation. The prior estimation aims to derive a prior distribution by utilizing the corpus generated by closed-source language models, while the posterior estimation employs a proxy model to update the prior distribution and derive a posterior distribution. By leveraging the estimated output distribution of closed-source language models, traditional knowledge distillation can be executed. Experimental results demonstrate that our method surpasses the performance of current models directly fine-tuned on data generated by closed-source language models.

distillation, knowledge distillation, language model, (15 more...)

arXiv.org Artificial Intelligence

Jan-13-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.04)
- North America > United States
  - Minnesota > Hennepin County > Minneapolis (0.14)

Genre:
- Research Report (1.00)

Industry:
- Education (0.71)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.71)
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.34)
  - Machine Learning > Learning Graphical Models
    - Directed Networks > Bayesian Learning (0.34)