GHPO: Adaptive Guidance for Stable and Efficient LLM Reinforcement Learning

Open in new window