HAPO: Training Language Models to Reason Concisely via History-Aware Policy Optimization

Open in new window