Evaluating Robustness of Reward Models for Mathematical Reasoning