BoNBoN Alignment for Large Language Models and the Sweetness of Best-of-n Sampling

Open in new window