On the Convergence of Self-Improving Online LLM Alignment

Open in new window