Adversarial Moment-Matching Distillation of Large Language Models
–Neural Information Processing Systems
In particular, we minimize the imitation gap by matching the action-value moments of the teacher's behavior from both on-and off-policy perspectives.
Neural Information Processing Systems
Oct-11-2025, 00:42:11 GMT