From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling