Post-hoc Reward Calibration: A Case Study on Length Bias