LAPO: Internalizing Reasoning Efficiency via Length-Adaptive Policy Optimization

Open in new window