Agentic Reinforcement Learning for Real-World Code Repair
Zhu, Siyu, Karpovich, Anastasiya, Chen, Albert, Koscheka, Jessica, Jannu, Shailesh, Wen, Di, Zhu, Yuqing, Jain, Rohit, Geramifard, Alborz
–arXiv.org Artificial Intelligence
We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across 1K real issues by pinning dependencies and disabling automatic upgrades. Building on this, we introduced a scalable simplified pipeline for large-scale reinforcement learning (RL). Using this setup, we supervise fine-tuned Qwen3-32B in the full pipeline and applied RL on top of SFT model in the simplified environment. The SFT model distilled from GPT-4.1 trajectories performs on par while being 56 smaller, and RL added 7-20% absolute gains under matched train-test conditions. "Thinking mode" was on par or worse in our experiments. Both SFT and RL models failed to generalize across environments, highlighting the importance of matching train-test environments for building reliable real-world code-fixing agents. Large language models (LLMs) have transformed the landscape of code intelligence, powering systems such as GitHub Copilot (Zhang et al., 2023), ChatGPT Code Interpreter (Mutch, 2025), and AlphaCode (Li et al., 2022). These models excel at code completion, bug fixing, and even multi-step development workflows, offering tangible productivity gains in both individual and collaborative programming settings.
arXiv.org Artificial Intelligence
Oct-28-2025