Distillation of Large Language Models via Concrete Score Matching

Open in new window