fault localization
RankingPolicyDecisions
Inarunwith ntimesteps,apolicy will makendecisions on actions totake; we conjecture that only asmall subset of these decisions delivers value over selecting a simple default action. Given atrained policy,we propose anovel black-box method based on statistical fault localisation that ranks thestates oftheenvironment according totheimportance ofdecisions made inthose states. Weargue that among other things, theranked list ofstates can help explain and understand the policy. As the ranking method is statistical, a direct evaluation of its quality is hard.
CodeFlowLM: Incremental Just-In-Time Defect Prediction with Pretrained Language Models and Exploratory Insights into Defect Localization
Monteiro, Monique Louise, Cabral, George G., OLiveira, Adriano L. I.
CodeT5+: CodeT5+ was initially chosen as one of the baselines because it was among the top-performing models in our experiments on defect prediction (Monteiro et al., 2025). Although CodeT5+ does not contain an explicit [CLS] token, as in BERT-based language models, we still use the first encoded token as the head of the classification layer. Therefore, we maintain the default practice of inspecting the weights of the first token attention heads. UniXCoder: In the same way as in CodeT5+, UniXCoder was also among the top performers in defect prediction experiments (Monteiro et al., 2025), so we keep the same default strategy of using the first encoded token attention weights. We also initially considered JIT-Block (Huang et al., 2024) and JIT-CF (Ju et al., 2025). Regarding JIT-Block, its authors reconstructed the dataset (JIT-Defects4J) into the changed block format, which preserves the relative positional information between added and deleted code lines -- information lost in traditional datasets -- thus facilitating the model's ability to learn the semantic meaning of code changes. So, as the dataset was changed, it would not be possible to conduct a fair comparison. Finally, according to its published results, JIT-CF does not achieve better results than JIT-Smart. A consolidated overview of the baseline classifiers is presented in Table 2. 3.4 Description of the Experiments RQ1 How do pre-trained language models perform in comparison to traditional machine learning approaches for continual within-project and cross-project Just-in-Time Software Defect Prediction (JIT-SDP)?
- North America > United States > New York > New York County > New York City (0.05)
- South America > Brazil > Pernambuco > Recife (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Time Travel: LLM-Assisted Semantic Behavior Localization with Git Bisect
We present a novel framework that integrates Large Language Models (LLMs) into the Git bisect process for semantic fault localization. Traditional bisect assumes deterministic predicates and binary failure states assumptions often violated in modern software development due to flaky tests, nonmonotonic regressions, and semantic divergence from upstream repositories. Our system augments bisect traversal with structured chain of thought reasoning, enabling commit by commit analysis under noisy conditions. We evaluate multiple open source and proprietary LLMs for their suitability and fine tune DeepSeekCoderV2 using QLoRA on a curated dataset of semantically labeled diffs. We adopt a weak supervision workflow to reduce annotation overhead, incorporating human in the loop corrections and self consistency filtering. Experiments across multiple open source projects show a 6.4 point absolute gain in success rate from 74.2 to 80.6 percent, leading to significantly fewer failed traversals and by experiment up to 2x reduction in average bisect time. We conclude with discussions on temporal reasoning, prompt design, and finetuning strategies tailored for commit level behavior analysis.
Inferring multiple helper Dafny assertions with LLMs
Silva, Álvaro, Mendes, Alexandra, Martins, Ruben
The Dafny verifier provides strong correctness guarantees but often requires numerous manual helper assertions, creating a significant barrier to adoption. We investigate the use of Large Language Models (LLMs) to automatically infer missing helper assertions in Dafny programs, with a primary focus on cases involving multiple missing assertions. To support this study, we extend the DafnyBench benchmark with curated datasets where one, two, or all assertions are removed, and we introduce a taxonomy of assertion types to analyze inference difficulty. Our approach refines fault localization through a hybrid method that combines LLM predictions with error-message heuristics. We implement this approach in a new tool called DAISY (Dafny Assertion Inference SYstem). While our focus is on multiple missing assertions, we also evaluate DAISY on single-assertion cases. DAISY verifies 63.4% of programs with one missing assertion and 31.7% with multiple missing assertions. Notably, many programs can be verified with fewer assertions than originally present, highlighting that proofs often admit multiple valid repair strategies and that recovering every original assertion is unnecessary. These results demonstrate that automated assertion inference can substantially reduce proof engineering effort and represent a step toward more scalable and accessible formal verification.
- Europe > Portugal > Porto > Porto (0.40)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (8 more...)
Generalization of Graph Neural Network Models for Distribution Grid Fault Detection
Karabulut, Burak, Manna, Carlo, Develder, Chris
Fault detection in power distribution grids is critical for ensuring system reliability and preventing costly outages. Moreover, fault detection methodologies should remain robust to evolving grid topologies caused by factors such as reconfigurations, equipment failures, and Distributed Energy Resource (DER) integration. Current data-driven state-of-the-art methods use Recurrent Neural Networks (RNNs) for temporal modeling and Graph Neural Networks (GNNs) for spatial learning, in an RNN+GNN pipeline setting (RGNN in short). Specifically, for power system fault diagnosis, Graph Convolutional Networks (GCNs) have been adopted. Yet, various more advanced GNN architectures have been proposed and adopted in domains outside of power systems. In this paper, we set out to systematically and consistently benchmark various GNN architectures in an RNN+GNN pipeline model. Specifically, to the best of our knowledge, we are the first to (i) propose to use GraphSAGE and Graph Attention (GAT, GATv2) in an RGNN for fault diagnosis, and (ii) provide a comprehensive benchmark against earlier proposed RGNN solutions (RGCN) as well as pure RNN models (especially Gated Recurrent Unit (GRU)), particularly (iii) exploring their generalization potential for deployment in different settings than those used for training them. Our experimental results on the IEEE 123-node distribution network show that RGATv2 has superior generalization capabilities, maintaining high performance with an F1-score reduction of $\sim$12% across different topology settings. In contrast, pure RNN models largely fail, experiencing an F1-score reduction of up to $\sim$60%, while other RGNN variants also exhibit significant performance degradation, i.e., up to $\sim$25% lower F1-scores.
- Europe > Latvia > Līvāni Municipality > Līvāni (0.05)
- South America > Ecuador > Pichincha Province > Quito (0.04)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- (5 more...)
- Energy > Power Industry (1.00)
- Energy > Renewable > Solar (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Who is Introducing the Failure? Automatically Attributing Failures of Multi-Agent Systems via Spectrum Analysis
Ge, Yu, Xie, Linna, Li, Zhong, Pei, Yu, Zhang, Tian
Large Language Model Powered Multi-Agent Systems (MASs) are increasingly employed to automate complex real-world problems, such as programming and scientific discovery. Despite their promising, MASs are not without their flaws. However, failure attribution in MASs - pinpointing the specific agent actions responsible for failures - remains underexplored and labor-intensive, posing significant challenges for debugging and system improvement. To bridge this gap, we propose FAMAS, the first spectrum-based failure attribution approach for MASs, which operates through systematic trajectory replay and abstraction, followed by spectrum analysis.The core idea of FAMAS is to estimate, from variations across repeated MAS executions, the likelihood that each agent action is responsible for the failure. In particular, we propose a novel suspiciousness formula tailored to MASs, which integrates two key factor groups, namely the agent behavior group and the action behavior group, to account for the agent activation patterns and the action activation patterns within the execution trajectories of MASs. Through expensive evaluations against 12 baselines on the Who and When benchmark, FAMAS demonstrates superior performance by outperforming all the methods in comparison.
- Europe > Austria > Vienna (0.14)
- Asia > China > Hong Kong (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- (14 more...)
- North America > United States (0.14)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Robots (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
FairFLRep: Fairness aware fault localization and repair of Deep Neural Networks
Openja, Moses, Arcaini, Paolo, Khomh, Foutse, Ishikawa, Fuyuki
Deep neural networks (DNNs) are being utilized in various aspects of our daily lives, including high-stakes decision-making applications that impact individuals. However, these systems reflect and amplify bias from the data used during training and testing, potentially resulting in biased behavior and inaccurate decisions. For instance, having different misclassification rates between white and black sub-populations. However, effectively and efficiently identifying and correcting biased behavior in DNNs is a challenge. This paper introduces FairFLRep, an automated fairness-aware fault localization and repair technique that identifies and corrects potentially bias-inducing neurons in DNN classifiers. FairFLRep focuses on adjusting neuron weights associated with sensitive attributes, such as race or gender, that contribute to unfair decisions. By analyzing the input-output relationships within the network, FairFLRep corrects neurons responsible for disparities in predictive quality parity. We evaluate FairFLRep on four image classification datasets using two DNN classifiers, and four tabular datasets with a DNN model. The results show that FairFLRep consistently outperforms existing methods in improving fairness while preserving accuracy. An ablation study confirms the importance of considering fairness during both fault localization and repair stages. Our findings also show that FairFLRep is more efficient than the baseline approaches in repairing the network.
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- Oceania > Australia > Victoria > Melbourne (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
- (17 more...)
- Law (1.00)
- Health & Medicine (1.00)
- Education (0.67)
- Government > Regional Government (0.45)
SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution
Li, Han, Shi, Yuling, Lin, Shaoxin, Gu, Xiaodong, Lian, Heng, Wang, Xin, Jia, Yantao, Huang, Tao, Wang, Qianxiang
Issue resolution has made remarkable progress thanks to the advanced reasoning capabilities of large language models (LLMs). Recently, agent-based frameworks such as SWE-agent have further advanced this progress by enabling autonomous, tool-using agents to tackle complex software engineering tasks. While existing agent-based issue resolution approaches are primarily based on agents' independent explorations, they often get stuck in local solutions and fail to identify issue patterns that span across different parts of the codebase. To address this limitation, we propose SWE-Debate, a competitive multi-agent debate framework that encourages diverse reasoning paths and achieves more consolidated issue localization. SWE-Debate first creates multiple fault propagation traces as localization proposals by traversing a code dependency graph. Then, it organizes a three-round debate among specialized agents, each embodying distinct reasoning perspectives along the fault propagation trace. This structured competition enables agents to collaboratively converge on a consolidated fix plan. Finally, this consolidated fix plan is integrated into an MCTS-based code modification agent for patch generation. Experiments on the SWE-bench benchmark show that SWE-Debate achieves new state-of-the-art results in open-source agent frameworks and outperforms baselines by a large margin.
- Europe > Austria > Vienna (0.14)
- North America > United States > District of Columbia > Washington (0.05)
- Asia > China > Shanghai > Shanghai (0.04)
- (7 more...)
Data Mining-Based Techniques for Software Fault Localization
Cellier, Peggy, Ducassé, Mireille, Ferré, Sébastien, Ridoux, Olivier, Wong, W. Eric
This chapter illustrates the basic concepts of fault localization using a data mining technique. It utilizes the Trityp program to illustrate the general method. Formal concept analysis and association rule are two well-known methods for symbolic data mining. In their original inception, they both consider data in the form of an object-attribute table. In their original inception, they both consider data in the form of an object-attribute table. The chapter considers a debugging process in which a program is tested against different test cases. Two attributes, PASS and FAIL, represent the issue of the test case. The chapter extends the analysis of data mining for fault localization for the multiple fault situations. It addresses how data mining can be further applied to fault localization for GUI components. Unlike traditional software, GUI test cases are usually event sequences, and each individual event has a unique corresponding event handler.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > New York > New York County > New York City (0.05)
- Asia > Middle East > Jordan (0.04)
- (5 more...)
- Information Technology > Data Science > Data Mining (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Object-Oriented Architecture (0.68)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.49)