DOCE: Finding the Sweet Spot for Execution-Based Code Generation

Li, Haau-Sing, Fernandes, Patrick, Gurevych, Iryna, Martins, André F. T.

Sep-13-2024–arXiv.org Artificial Intelligence

Recently, a diverse set of decoding and reranking procedures have been shown effective for LLM-based code generation. However, a comprehensive framework that links and experimentally compares these methods is missing. We address this by proposing Decoding Objectives for Code Execution, a comprehensive framework that includes candidate generation, $n$-best reranking, minimum Bayes risk (MBR) decoding, and self-debugging as the core components. We then study the contributions of these components through execution-based evaluation metrics. Our findings highlight the importance of execution-based methods and the difference gap between execution-based and execution-free methods. Furthermore, we assess the impact of filtering based on trial unit tests, a simple and effective strategy that has been often overlooked in prior works. We also propose self-debugging on multiple candidates, obtaining state-of-the-art performance on reranking for code generation. We expect our framework to provide a solid guideline for future research on code generation.

mbr-exec, test case, unit test, (16 more...)

arXiv.org Artificial Intelligence

Sep-13-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - Dominican Republic (0.04)
  - United States
    - Pennsylvania (0.04)
    - Washington > King County
      - Seattle (0.04)
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Portugal > Lisbon
    - Lisbon (0.14)
  - Italy > Tuscany
    - Florence (0.04)
  - Germany > Hesse
    - Darmstadt Region > Darmstadt (0.04)
- Asia
  - Singapore (0.04)
  - Thailand > Bangkok
    - Bangkok (0.04)
  - Middle East > UAE
    - Abu Dhabi Emirate > Abu Dhabi (0.04)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Automatic Programming (1.00)
  - Natural Language > Large Language Model (0.91)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found