Theoretical guarantees on the best-of-n alignment policy

Open in new window