Learning to Think: Information-Theoretic Reinforcement Fine-Tuning for LLMs

Open in new window