AITopics | bug localization

Collaborating Authors

bug localization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

f29a179746902e331572c483c45e5086-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-15-2026, 01:38:36 GMT

attribution, localization, reviewer, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.70)

Add feedback

Reformulate, Retrieve, Localize: Agents for Repository-Level Bug Localization

Caumartin, Genevieve, Melo, Glaucia

arXiv.org Artificial IntelligenceDec-9-2025

Bug localization remains a critical yet time-consuming challenge in large-scale software repositories. Traditional information retrieval-based bug localization (IRBL) methods rely on unchanged bug descriptions, which often contain noisy information, leading to poor retrieval accuracy. Recent advances in large language models (LLMs) have improved bug localization through query reformulation, yet the effect on agent performance remains unexplored. In this study, we investigate how an LLM-powered agent can improve file-level bug localization via lightweight query reformulation and summarization. We first employ an open-source, non-fine-tuned LLM to extract key information from bug reports, such as identifiers and code snippets, and reformulate queries pre-retrieval. Our agent then orchestrates BM25 retrieval using these preprocessed queries, automating localization workflow at scale. Using the best-performing query reformulation technique, our agent achieves 35% better ranking in first-file retrieval than our BM25 baseline and up to +22% file retrieval performance over SWE-agent.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.07022

Country:

Europe (1.00)
North America > United States (0.47)
South America > Brazil > Rio de Janeiro (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Natural Language Summarization Enables Multi-Repository Bug Localization by LLMs in Microservice Architectures

Oskooei, Amirkia Rafiei, Yukcu, S. Selcan, Bozoglan, Mehmet Cevheri, Aktas, Mehmet S.

arXiv.org Artificial IntelligenceDec-8-2025

Bug localization in multi-repository microservice architectures is challenging due to the semantic gap between natural language bug reports and code, LLM context limitations, and the need to first identify the correct repository. We propose reframing this as a natural language reasoning task by transforming codebases into hierarchical NL summaries and performing NL-to-NL search instead of cross-modal retrieval. Our approach builds context-aware summaries at file, directory, and repository levels, then uses a two-phase search: first routing bug reports to relevant repositories, then performing top-down localization within those repositories. Evaluated on DNext, an industrial system with 46 repositories and 1.1M lines of code, our method achieves Pass@10 of 0.82 and MRR of 0.50, significantly outperforming retrieval baselines and agentic RAG systems like GitHub Copilot and Cursor. This work demonstrates that engineered natural language representations can be more effective than raw source code for scalable bug localization, providing an interpretable repository -> directory -> file search path, which is vital for building trust in enterprise AI tools by providing essential transparency.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2512.05908

Country:

South America > Brazil > Rio de Janeiro (0.17)
Asia > Middle East > Republic of Türkiye (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Practical Code RAG at Scale: Task-Aware Retrieval Design Choices under Compute Budgets

Galimzyanov, Timur, Kolomyttseva, Olga, Bogomolov, Egor

arXiv.org Artificial IntelligenceOct-24-2025

We study retrieval design for code-focused generation tasks under realistic compute budgets. Using two complementary tasks from Long Code Arena -- code completion and bug localization -- we systematically compare retrieval configurations across various context window sizes along three axes: (i) chunking strategy, (ii) similarity scoring, and (iii) splitting granularity. (1) For PL-PL, sparse BM25 with word-level splitting is the most effective and practical, significantly outperforming dense alternatives while being an order of magnitude faster. (2) For NL-PL, proprietary dense encoders (Voyager-3 family) consistently beat sparse retrievers, however requiring 100x larger latency. (3) Optimal chunk size scales with available context: 32-64 line chunks work best at small budgets, and whole-file retrieval becomes competitive at 16000 tokens. (4) Simple line-based chunking matches syntax-aware splitting across budgets. (5) Retrieval latency varies by up to 200x across configurations; BPE-based splitting is needlessly slow, and BM25 + word splitting offers the best quality-latency trade-off. Thus, we provide evidence-based recommendations for implementing effective code-oriented RAG systems based on task requirements, model constraints, and computational efficiency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.20609

Country:

Asia (0.46)
North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.48)

Add feedback

f29a179746902e331572c483c45e5086-AuthorFeedback.pdf

Neural Information Processing SystemsAug-20-2025, 09:02:21 GMT

attribution, identifier, reviewer, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.50)

Add feedback

Understanding Software Engineering Agents Through the Lens of Traceability: An Empirical Study

Ceka, Ira, Pujar, Saurabh, Ramji, Shyam, Buratti, Luca, Kaiser, Gail, Ray, Baishakhi

arXiv.org Artificial IntelligenceJun-11-2025

With the advent of large language models (LLMs), software engineering agents (SWE agents) have emerged as a powerful paradigm for automating a range of software tasks -- from code generation and repair to test case synthesis. These agents operate autonomously by interpreting user input and responding to environmental feedback. While various agent architectures have demonstrated strong empirical performance, the internal decision-making worfklows that drive their behavior remain poorly understood. Deeper insight into these workflows hold promise for improving both agent reliability and efficiency. In this work, we present the first systematic study of SWE agent behavior through the lens of execution traces. Our contributions are as follows: (1) we propose the first taxonomy of decision-making pathways across five representative agents; (2) using this taxonomy, we identify three core components essential to agent success -- bug localization, patch generation, and reproduction test generation -- and study each in depth; (3) we study the impact of test generation on successful patch production; and analyze strategies that can lead to successful test generation; (4) we further conduct the first large-scale code clone analysis comparing agent-generated and developer-written patches and provide a qualitative study revealing structural and stylistic differences in patch content. Together, these findings offer novel insights into agent design and open avenues for building agents that are both more effective and more aligned with human development practices.

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2506.08311

Country: North America > United States (0.46)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Improved IR-based Bug Localization with Intelligent Relevance Feedback

Samir, Asif Mohammed, Rahman, Mohammad Masudur

arXiv.org Artificial IntelligenceJan-17-2025

Software bugs pose a significant challenge during development and maintenance, and practitioners spend nearly 50% of their time dealing with bugs. Many existing techniques adopt Information Retrieval (IR) to localize a reported bug using textual and semantic relevance between bug reports and source code. However, they often struggle to bridge a critical gap between bug reports and code that requires in-depth contextual understanding, which goes beyond textual or semantic relevance. In this paper, we present a novel technique for bug localization - BRaIn - that addresses the contextual gaps by assessing the relevance between bug reports and code with Large Language Models (LLM). It then leverages the LLM's feedback (a.k.a., Intelligent Relevance Feedback) to reformulate queries and re-rank source documents, improving bug localization. We evaluate BRaIn using a benchmark dataset, Bench4BL, and three performance metrics and compare it against six baseline techniques from the literature. Our experimental results show that BRaIn outperforms baselines by 87.6%, 89.5%, and 48.8% margins in MAP, MRR, and HIT@K, respectively. Additionally, it can localize approximately 52% of bugs that cannot be localized by the baseline techniques due to the poor quality of corresponding bug reports. By addressing the contextual gaps and introducing Intelligent Relevance Feedback, BRaIn advances not only theory but also improves IR-based bug localization.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.10542

Country:

North America > Canada > Nova Scotia > Halifax Regional Municipality > Halifax (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.47)

Add feedback

Location is Key: Leveraging Large Language Model for Functional Bug Localization in Verilog

Yao, Bingkun, Wang, Ning, Zhou, Jie, Wang, Xi, Gao, Hong, Jiang, Zhe, Guan, Nan

arXiv.org Artificial IntelligenceSep-29-2024

Bug localization in Verilog code is a crucial and time-consuming task during the verification of hardware design. Since introduction, Large Language Models (LLMs) have showed their strong programming capabilities. However, no work has yet considered using LLMs for bug localization in Verilog code. This paper presents Location-is-Key, an opensource LLM solution to locate functional errors in Verilog snippets. LiK achieves high localization accuracy, with a pass@1 localization accuracy of 93.3% on our test dataset based on RTLLM, surpassing GPT-4's 77.9% and comparable to Claude-3.5's 90.8%. Additionally, the bug location obtained by LiK significantly improves GPT-3.5's bug repair efficiency (Functional pass@1 increased from 40.39% to 58.92%), highlighting the importance of bug localization in LLM-based Verilog debugging. Compared to existing methods, LiK only requires the design specification and the erroneous code snippet, without the need for testbenches, assertions, or any other EDA tools. This research demonstrates the feasibility of using LLMs for Verilog error localization, thus providing a new direction for automatic Verilog code debugging.

bug localization, buggy line, localization, (14 more...)

arXiv.org Artificial Intelligence

2409.15186

Country:

North America > United States > District of Columbia > Washington (0.05)
Asia > China > Hong Kong (0.05)
Asia > China > Jiangsu Province > Nanjing (0.04)
(2 more...)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning

Chakraborty, Partha, Alfadel, Mahmoud, Nagappan, Meiyappan

arXiv.org Artificial IntelligenceAug-19-2024

Software bugs require developers to exert significant effort to identify and resolve them, often consuming about one-third of their time. Bug localization, the process of pinpointing the exact source code files that need modification, is crucial in reducing this effort. Existing bug localization tools, typically reliant on deep learning techniques, face limitations in cross-project applicability and effectiveness in multi-language environments. Recent advancements with Large Language Models (LLMs) offer detailed representations for bug localization. However, they encounter challenges with limited context windows and mapping accuracy. To address these issues, we propose BLAZE, an approach that employs dynamic chunking and hard example learning. First, BLAZE dynamically segments source code to minimize continuity loss. Then, BLAZE fine-tunes a GPT-based model using challenging bug cases, in order to enhance cross-project and cross-language bug localization. To support the capability of BLAZE, we create the BEETLEBOX dataset, which comprises 26,321 bugs from 29 large and thriving open-source projects across five different programming languages (Java, C++, Python, Go, and JavaScript). Our evaluations of BLAZE on three benchmark datasets BEETLEBOX, SWE-Bench, and Ye et al. demonstrate substantial improvements compared to six state-of-the-art baselines. Specifically, BLAZE achieves up to an increase of 120% in Top 1 accuracy, 144% in Mean Average Precision (MAP), and 100% in Mean Reciprocal Rank (MRR). An extensive ablation study confirms the contributions of our pipeline components to the overall performance enhancement.

bug localization, bug report, localization, (14 more...)

arXiv.org Artificial Intelligence

2407.17631

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
North America > United States > Maryland > Montgomery County > Gaithersburg (0.04)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Enhancing the Code Debugging Ability of LLMs via Communicative Agent Based Data Refinement

Yang, Weiqing, Wang, Hanbin, Liu, Zhenghao, Li, Xinze, Yan, Yukun, Wang, Shuo, Gu, Yu, Yu, Minghe, Liu, Zhiyuan, Yu, Ge

arXiv.org Artificial IntelligenceAug-9-2024

Debugging is a vital aspect of software development, yet the debugging capabilities of Large Language Models (LLMs) remain largely unexplored. This paper first introduces DEBUGEVAL, a comprehensive benchmark designed to evaluate the debugging capabilities of LLMs. DEBUGEVAL collects data from existing high-quality datasets and designs four different tasks to evaluate the debugging effectiveness, including BUG Localization, BUG Identification, Code Review, and Code Repair. Additionally, to enhance the code debugging ability of LLMs, this paper proposes a CoMmunicative Agent BaSed DaTa REfinement FRamework (MASTER), which generates the refined code debugging data for supervised finetuning. Specifically, MASTER employs the Code Quizzer to generate refined data according to the defined tasks of DEBUGEVAL. Then the Code Learner acts as a critic and reserves the generated problems that it can not solve. Finally, the Code Teacher provides a detailed Chain-of-Thought based solution to deal with the generated problem. We collect the synthesized data and finetune the Code Learner to enhance the debugging ability and conduct the NeuDebugger model. Our experiments evaluate various LLMs and NeuDebugger in the zero-shot setting on DEBUGEVAL. Experimental results demonstrate that these 7B-scale LLMs have weaker debugging capabilities, even these code-oriented LLMs. On the contrary, these larger models (over 70B) show convincing debugging ability. Our further analyses illustrate that MASTER is an effective method to enhance the code debugging ability by synthesizing data for Supervised Fine-Tuning (SFT) LLMs.

buggy code, language model, llm, (14 more...)

arXiv.org Artificial Intelligence

2408.05006

Country:

Asia > China > Liaoning Province > Shenyang (0.04)
Asia > China > Beijing > Beijing (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
(2 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback