Agentic Reinforcement Learning for Real-World Code Repair

Zhu, Siyu, Karpovich, Anastasiya, Chen, Albert, Koscheka, Jessica, Jannu, Shailesh, Wen, Di, Zhu, Yuqing, Jain, Rohit, Geramifard, Alborz

Oct-28-2025–arXiv.org Artificial Intelligence

We tackle the challenge of training reliable code-fixing agents in real repositories, where complex builds and shifting dependencies make evaluation unstable. We developed a verifiable pipeline with success defined as post-fix build validation and improved reproducibility across 1K real issues by pinning dependencies and disabling automatic upgrades. Building on this, we introduced a scalable simplified pipeline for large-scale reinforcement learning (RL). Using this setup, we supervise fine-tuned Qwen3-32B in the full pipeline and applied RL on top of SFT model in the simplified environment. The SFT model distilled from GPT-4.1 trajectories performs on par while being 56 smaller, and RL added 7-20% absolute gains under matched train-test conditions. "Thinking mode" was on par or worse in our experiments. Both SFT and RL models failed to generalize across environments, highlighting the importance of matching train-test environments for building reliable real-world code-fixing agents. Large language models (LLMs) have transformed the landscape of code intelligence, powering systems such as GitHub Copilot (Zhang et al., 2023), ChatGPT Code Interpreter (Mutch, 2025), and AlphaCode (Li et al., 2022). These models excel at code completion, bug fixing, and even multi-step development workflows, offering tangible productivity gains in both individual and collaborative programming settings.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

Oct-28-2025

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.88)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found