Self-play with Execution Feedback: Improving Instruction-following Capabilities of Large Language Models

Open in new window