Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs

Open in new window