Quantile Regression for Distributional Reward Models in RLHF