Greedy Sampling Is Provably Efficient for RLHF

Open in new window