Quantile Regression for Distributional Reward Models in RLHF

Open in new window