Automated Program Repair: Emerging trends pose and expose problems for benchmarks

Renzullo, Joseph, Reiter, Pemma, Weimer, Westley, Forrest, Stephanie

May-8-2024–arXiv.org Artificial Intelligence

A variety of techniques have been developed, e.g., evolutionary computation[60, 133], methods incorporating templated mutation operators[71], semantic inference techniques[79] targeting single-cause defects, and methods designed to handle multi-hunk bugs[100]. Increasingly, researchers have applied ML-based methods to APR tasks (Section 3), but data leakage is a concern(Section 4). Each new technique, or modification of an existing technique, tends to be developed by an independent research team, without reference to a common, formal definition of APR. Benchmarks are not enough to standardize evaluation on their own (Section 5). As motivating examples, consider the following inconsistencies in the published literature: Correctness. VFix [123] identifies correct patches that pass all test cases and are semantically or syntactically equivalent to the original bug-fix, while VRepair[26] reports repair accuracy in terms of semantic equivalence to the original bug-fix, and SynFix [10] defines correctness simply as passing the test cases. Each of these is a reasonable definition, but collectively, their differences make it difficult to compare results.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

May-8-2024

arXiv.org PDF

Add feedback

Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States
  - California (0.28)
  - Michigan (0.28)
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.14)

Genre:
- Overview (1.00)
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found