A Benchmark for Localizing Code and Non-Code Issues in Software Projects

Zhang, Zejun, Wang, Jian, Yang, Qingyun, Pan, Yifan, Tang, Yi, Li, Yi, Xing, Zhenchang, Zhang, Tian, Li, Xuandong, Zhang, Guoan

Oct-1-2025–arXiv.org Artificial Intelligence

Accurate project localization (e.g., files and functions) for issue resolution is a critical first step in software maintenance. However, existing benchmarks for issue localization, such as SWE-Bench and LocBench, are limited. They focus predominantly on pull-request issues and code locations, ignoring other evidence and non-code files such as commits, comments, configurations, and documentation. To address this gap, we introduce MULocBench, a comprehensive dataset of 1,100 issues from 46 popular GitHub Python projects. Comparing with existing benchmarks, MULocBench offers greater diversity in issue types, root causes, location scopes, and file types, providing a more realistic testbed for evaluation. Using this benchmark, we assess the performance of state-of-the-art localization methods and five LLM-based prompting strategies. Our results reveal significant limitations in current techniques: even at the file level, performance metrics (Acc@5, F1) remain below 40%. This underscores the challenge of generalizing to realistic, multi-faceted issue resolution. Modern software projects are inherently complex. They often consist of thousands of files spanning code, configurations, tests, and documentation. The complexity making developers routinely encounter a wide spectrum of issues, ranging from runtime failures and unexpected results to enhancement requests and usage questions. A prerequisite for resolving these issues is to accurately identify the locations, such as the relevant files and functions. Existing benchmarks have advanced research on issue localization. SWE-Bench Jimenez et al. collects 2,294 issues with pull requests from 12 Python projects, primarily targeting bug fixing. To encourage adoption, it releases SWE-bench Lite, a subset of 300 instances.

large language model, machine learning, programming language, (22 more...)

arXiv.org Artificial Intelligence

Oct-1-2025

arXiv.org PDF

Add feedback

Country:
- Europe > Austria (0.28)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology
  - Software > Programming Languages (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language > Large Language Model (1.00)
    - Machine Learning > Neural Networks
      - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found