Attention as a Compass: Efficient Exploration for Process-Supervised RL in Reasoning Models

Open in new window