Mitigating Reward Overoptimization via Lightweight Uncertainty Estimation