AITopics | fault localization

Collaborating Authors

fault localization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

RankingPolicyDecisions

Neural Information Processing SystemsFeb-8-2026, 12:15:46 GMT

Inarunwith ntimesteps,apolicy will makendecisions on actions totake; we conjecture that only asmall subset of these decisions delivers value over selecting a simple default action. Given atrained policy,we propose anovel black-box method based on statistical fault localisation that ranks thestates oftheenvironment according totheimportance ofdecisions made inthose states. Weargue that among other things, theranked list ofstates can help explain and understand the policy. As the ranking method is statistical, a direct evaluation of its quality is hard.

artificial intelligence, execution, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)

Add feedback

CodeFlowLM: Incremental Just-In-Time Defect Prediction with Pretrained Language Models and Exploratory Insights into Defect Localization

Monteiro, Monique Louise, Cabral, George G., OLiveira, Adriano L. I.

arXiv.org Artificial IntelligenceDec-2-2025

CodeT5+: CodeT5+ was initially chosen as one of the baselines because it was among the top-performing models in our experiments on defect prediction (Monteiro et al., 2025). Although CodeT5+ does not contain an explicit [CLS] token, as in BERT-based language models, we still use the first encoded token as the head of the classification layer. Therefore, we maintain the default practice of inspecting the weights of the first token attention heads. UniXCoder: In the same way as in CodeT5+, UniXCoder was also among the top performers in defect prediction experiments (Monteiro et al., 2025), so we keep the same default strategy of using the first encoded token attention weights. We also initially considered JIT-Block (Huang et al., 2024) and JIT-CF (Ju et al., 2025). Regarding JIT-Block, its authors reconstructed the dataset (JIT-Defects4J) into the changed block format, which preserves the relative positional information between added and deleted code lines -- information lost in traditional datasets -- thus facilitating the model's ability to learn the semantic meaning of code changes. So, as the dataset was changed, it would not be possible to conduct a fair comparison. Finally, according to its published results, JIT-CF does not achieve better results than JIT-Smart. A consolidated overview of the baseline classifiers is presented in Table 2. 3.4 Description of the Experiments RQ1 How do pre-trained language models perform in comparison to traditional machine learning approaches for continual within-project and cross-project Just-in-Time Software Defect Prediction (JIT-SDP)?

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2512.00231

Country: North America > United States (0.15)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect

Wang, Yujing, Hong, Weize

arXiv.org Artificial IntelligenceNov-25-2025

We present a novel framework that integrates Large Language Models (LLMs) into the Git bisect process for semantic fault localization. Traditional bisect assumes deterministic predicates and binary failure states assumptions often violated in modern software development due to flaky tests, nonmonotonic regressions, and semantic divergence from upstream repositories. Our system augments bisect traversal with structured chain of thought reasoning, enabling commit by commit analysis under noisy conditions. We evaluate multiple open source and proprietary LLMs for their suitability and fine tune DeepSeekCoderV2 using QLoRA on a curated dataset of semantically labeled diffs. We adopt a weak supervision workflow to reduce annotation overhead, incorporating human in the loop corrections and self consistency filtering. Experiments across multiple open source projects show a 6.4 point absolute gain in success rate from 74.2 to 80.6 percent, leading to significantly fewer failed traversals and by experiment up to 2x reduction in average bisect time. We conclude with discussions on temporal reasoning, prompt design, and finetuning strategies tailored for commit level behavior analysis.

large language model, machine learning, regression, (17 more...)

arXiv.org Artificial Intelligence

2511.18854

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Inferring multiple helper Dafny assertions with LLMs

Silva, Álvaro, Mendes, Alexandra, Martins, Ruben

arXiv.org Artificial IntelligenceNov-4-2025

The Dafny verifier provides strong correctness guarantees but often requires numerous manual helper assertions, creating a significant barrier to adoption. We investigate the use of Large Language Models (LLMs) to automatically infer missing helper assertions in Dafny programs, with a primary focus on cases involving multiple missing assertions. To support this study, we extend the DafnyBench benchmark with curated datasets where one, two, or all assertions are removed, and we introduce a taxonomy of assertion types to analyze inference difficulty. Our approach refines fault localization through a hybrid method that combines LLM predictions with error-message heuristics. We implement this approach in a new tool called DAISY (Dafny Assertion Inference SYstem). While our focus is on multiple missing assertions, we also evaluate DAISY on single-assertion cases. DAISY verifies 63.4% of programs with one missing assertion and 31.7% with multiple missing assertions. Notably, many programs can be verified with fewer assertions than originally present, highlighting that proofs often admit multiple valid repair strategies and that recovering every original assertion is unnecessary. These results demonstrate that automated assertion inference can substantially reduce proof engineering effort and represent a step toward more scalable and accessible formal verification.

assertion, large language model, natural language, (15 more...)

arXiv.org Artificial Intelligence

2511.00125

Country:

Europe (1.00)
North America > United States (0.68)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Generalization of Graph Neural Network Models for Distribution Grid Fault Detection

Karabulut, Burak, Manna, Carlo, Develder, Chris

arXiv.org Artificial IntelligenceOct-7-2025

Fault detection in power distribution grids is critical for ensuring system reliability and preventing costly outages. Moreover, fault detection methodologies should remain robust to evolving grid topologies caused by factors such as reconfigurations, equipment failures, and Distributed Energy Resource (DER) integration. Current data-driven state-of-the-art methods use Recurrent Neural Networks (RNNs) for temporal modeling and Graph Neural Networks (GNNs) for spatial learning, in an RNN+GNN pipeline setting (RGNN in short). Specifically, for power system fault diagnosis, Graph Convolutional Networks (GCNs) have been adopted. Yet, various more advanced GNN architectures have been proposed and adopted in domains outside of power systems. In this paper, we set out to systematically and consistently benchmark various GNN architectures in an RNN+GNN pipeline model. Specifically, to the best of our knowledge, we are the first to (i) propose to use GraphSAGE and Graph Attention (GAT, GATv2) in an RGNN for fault diagnosis, and (ii) provide a comprehensive benchmark against earlier proposed RGNN solutions (RGCN) as well as pure RNN models (especially Gated Recurrent Unit (GRU)), particularly (iii) exploring their generalization potential for deployment in different settings than those used for training them. Our experimental results on the IEEE 123-node distribution network show that RGATv2 has superior generalization capabilities, maintaining high performance with an F1-score reduction of $\sim$12% across different topology settings. In contrast, pure RNN models largely fail, experiencing an F1-score reduction of up to $\sim$60%, while other RGNN variants also exhibit significant performance degradation, i.e., up to $\sim$25% lower F1-scores.

artificial intelligence, expert system, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.03571

Country:

Europe (0.68)
North America > United States (0.46)

Genre: Research Report (1.00)

Industry:

Energy > Power Industry (1.00)
Energy > Renewable > Solar (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis

Ge, Yu, Xie, Linna, Li, Zhong, Pei, Yu, Zhang, Tian

arXiv.org Artificial IntelligenceSep-18-2025

Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs are not without their flaws. However, failure attribution in MASs - pinpointing the specific agent actions responsible for failures - remains underexplored and labor-intensive, posing significant challenges for debugging and system improvement. To bridge this gap, we propose FAMAS, the first spectrum-based failure attribution approach for MASs, which operates through systematic trajectory replay and abstraction, followed by spectrum analysis.The core idea of FAMAS is to estimate, from variations across repeated MAS executions, the likelihood that each agent action is responsible for the failure. In particular, we propose a novel suspiciousness formula tailored to MASs, which integrates two key factor groups, namely the agent behavior group and the action behavior group, to account for the agent activation patterns and the action activation patterns within the execution trajectories of MASs. Through expensive evaluations against 12 baselines on the Who and When benchmark, FAMAS demonstrates superior performance by outperforming all the methods in comparison.

artificial intelligence, execution trajectory, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2509.13782

Country:

North America > United States (1.00)
Asia (1.00)
Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks

Openja, Moses, Arcaini, Paolo, Khomh, Foutse, Ishikawa, Fuyuki

arXiv.org Artificial IntelligenceAug-12-2025

Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications that impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep, an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.

artificial intelligence, fairflrep, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.08151

Country:

Europe (1.00)
Asia (1.00)
Oceania > Australia (0.67)
North America > United States > California (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Health & Medicine (1.00)
Education (0.67)
Government > Regional Government (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution

Li, Han, Shi, Yuling, Lin, Shaoxin, Gu, Xiaodong, Lian, Heng, Wang, Xin, Jia, Yantao, Huang, Tao, Wang, Qianxiang

arXiv.org Artificial IntelligenceAug-1-2025

Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2507.23348

Country:

North America > United States (1.00)
Asia (0.68)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Data Mining-Based Techniques for Software Fault Localization

Cellier, Peggy, Ducassé, Mireille, Ferré, Sébastien, Ridoux, Olivier, Wong, W. Eric

arXiv.org Artificial IntelligenceMay-27-2025

This chapter illustrates the basic concepts of fault localization using a data mining technique. It utilizes the Trityp program to illustrate the general method. Formal concept analysis and association rule are two well-known methods for symbolic data mining. In their original inception, they both consider data in the form of an object-attribute table. In their original inception, they both consider data in the form of an object-attribute table. The chapter considers a debugging process in which a program is tested against different test cases. Two attributes, PASS and FAIL, represent the issue of the test case. The chapter extends the analysis of data mining for fault localization for the multiple fault situations. It addresses how data mining can be further applied to fault localization for GUI components. Unlike traditional software, GUI test cases are usually event sequences, and each individual event has a unique corresponding event handler.

artificial intelligence, data mining, object-oriented architecture, (19 more...)

arXiv.org Artificial Intelligence

doi: 10.1002/9781119880929.ch7

2505.18216

Country: North America > United States > California (0.28)

Genre: Research Report (0.40)

Industry: Materials > Metals & Mining (0.34)

Technology: