Efficient Reasoning via Reward Model