Automated Program Repair: Emerging trends pose and expose problems for benchmarks
Renzullo, Joseph, Reiter, Pemma, Weimer, Westley, Forrest, Stephanie
–arXiv.org Artificial Intelligence
A variety of techniques have been developed, e.g., evolutionary computation[60, 133], methods incorporating templated mutation operators[71], semantic inference techniques[79] targeting single-cause defects, and methods designed to handle multi-hunk bugs[100]. Increasingly, researchers have applied ML-based methods to APR tasks (Section 3), but data leakage is a concern(Section 4). Each new technique, or modification of an existing technique, tends to be developed by an independent research team, without reference to a common, formal definition of APR. Benchmarks are not enough to standardize evaluation on their own (Section 5). As motivating examples, consider the following inconsistencies in the published literature: Correctness. VFix [123] identifies correct patches that pass all test cases and are semantically or syntactically equivalent to the original bug-fix, while VRepair[26] reports repair accuracy in terms of semantic equivalence to the original bug-fix, and SynFix [10] defines correctness simply as passing the test cases. Each of these is a reasonable definition, but collectively, their differences make it difficult to compare results.
arXiv.org Artificial Intelligence
May-8-2024
- Country:
- Asia (1.00)
- Europe (1.00)
- North America > United States
- California (0.28)
- Michigan (0.28)
- Pennsylvania > Allegheny County
- Pittsburgh (0.14)
- Genre:
- Overview (1.00)
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (0.46)
- Technology: