A Critical Evaluation of AI Feedback for Aligning Large Language Models

Neural Information Processing Systems 

Learning from AI feedback (LAIF) is a popular paradigm for improving the instruction-following abilities of powerful pre-trained language models.