Greedy Sampling Is Provably Efficient For RLHF

Open in new window