Self-Aligned Reward: Towards Effective and Efficient Reasoners

Open in new window