Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models

Open in new window