PILAF: Optimal Human Preference Sampling for Reward Modeling

Open in new window