f-Divergence Minimization for Sequence-Level Knowledge Distillation

Open in new window