On the Power of (Approximate) Reward Models for Inference-Time Scaling