RLEF: Grounding Code LLMs in Execution Feedback with Reinforcement Learning

Gehring, Jonas, Zheng, Kunhao, Copet, Jade, Mella, Vegard, Cohen, Taco, Synnaeve, Gabriel

Oct-2-2024–arXiv.org Artificial Intelligence

Large language models (LLMs) deployed as agents solve user-specified tasks over multiple steps while keeping the required manual engagement to a minimum. Crucially, such LLMs need to ground their generations in any feedback obtained to reliably achieve desired outcomes. We propose an end-to-end reinforcement learning method for teaching models to leverage execution feedback in the realm of code synthesis, where state-of-the-art LLMs struggle to improve code iteratively compared to independent sampling. We benchmark on competitive programming tasks, where we achieve new start-of-the art results with both small (8B parameters) and large (70B) models while reducing the amount of samples required by an order of magnitude. Our analysis of inference-time behavior demonstrates that our method produces LLMs that effectively leverage automatic feedback over multiple steps.

large language model, machine learning, reinforcement learning, (19 more...)

arXiv.org Artificial Intelligence

Oct-2-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Portugal (0.14)
- North America > United States
  - Massachusetts (0.14)

Genre:
- Research Report (0.64)

Industry:
- Education > Instructional Theory (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Reinforcement Learning (1.00)
  - Natural Language > Large Language Model (1.00)