Improving Reward Models with Synthetic Critiques