Learning to Maximize Mutual Information for Chain-of-Thought Distillation

Open in new window