Self-Aligned Reward: Towards Effective and Efficient Reasoners