Reward Collapse in Aligning Large Language Models