Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code

Chae, Hyungjoo, Kwon, Taeyoon, Moon, Seungjun, Song, Yongho, Kang, Dongjin, Ong, Kai Tzu-iunn, Kwak, Beong-woo, Bae, Seonghyeon, Hwang, Seung-won, Yeo, Jinyoung

Oct-4-2024–arXiv.org Artificial Intelligence

This paper presents Coffee-Gym, a comprehensive RL environment for training models that provide feedback on code editing. Coffee-Gym includes two major components: (1) Coffee, a dataset containing humans' code edit traces for coding questions and machine-written feedback for editing erroneous code; (2) CoffeeEval, a reward function that faithfully reflects the helpfulness of feedback by assessing the performance of the revised code in unit tests. With them, Coffee-Gym addresses the unavailability of high-quality datasets for training feedback models with RL, and provides more accurate rewards than the SOTA reward model (i.e., GPT-4). By applying Coffee-Gym, we elicit feedback models that outperform baselines in enhancing open-source code LLMs' code editing, making them comparable with closed-source LLMs. We make the dataset and the model checkpoint publicly available.

code editing, editing, feedback model, (14 more...)

arXiv.org Artificial Intelligence

Oct-4-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.04)
- Asia > South Korea
  - Seoul > Seoul (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)