Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models