Mitigating Reward Over-Optimization in RLHF via Behavior-Supported Regularization