Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models

Open in new window