Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization

Jun-10-2026, 06:12:36 GMT–Neural Information Processing Systems

Large Language Models (LLMs) generate functionally correct solutions but often fall short in code efficiency, a critical bottleneck for real-world deployment. In this paper, we introduce a novel test-time iterative optimization framework to address this, employing a closed-loop system where LLMs iteratively refine code based on empirical performance feedback from an execution sandbox. We explore three training strategies: Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Group Relative Policy Optimization~(GRPO). Experiments on our Venus dataset and the APPS benchmark show that SFT and DPO rapidly saturate in efficiency gains.

artificial intelligence, large language model, natural language, (7 more...)

Neural Information Processing Systems

Jun-10-2026, 06:12:36 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.87)