Adversarial Moment-Matching Distillation of Large Language Models

Neural Information Processing Systems 

In particular, we minimize the imitation gap by matching the action-value moments of the teacher's behavior from both on-and off-policy perspectives.