On Designing Effective RL Reward at Training Time for LLM Reasoning