Goto

Collaborating Authors

 Explanation & Argumentation


DiCoFlex: Model-agnostic diverse counterfactuals with flexible control

Neural Information Processing Systems

Counterfactual explanations play a pivotal role in explainable artificial intelligence (XAI) by offering intuitive, human-understandable alternatives that elucidate machine learning model decisions. Despite their significance, existing methods for generating counterfactuals often require constant access to the predictive model, involve computationally intensive optimization for each instance and lack the flexibility to adapt to new user-defined constraints without retraining. In this paper, we propose DiCoFlex, a novel model-agnostic, conditional generative framework that produces multiple diverse counterfactuals in a single forward pass. Leveraging conditional normalizing flows trained solely on labeled data, DiCoFlex addresses key limitations by enabling real-time user-driven customization of constraints such as sparsity and actionability at inference time. Extensive experiments on standard benchmark datasets show that DiCoFlex outperforms existing methods in terms of validity, diversity, proximity, and constraint adherence, making it a practical and scalable solution for counterfactual generation in sensitive decision-making domains.


AdaptGrad: Adaptive Sampling to Reduce Noise

Neural Information Processing Systems

Gradient smoothing is an efficient approach to reducing noise in gradient-based model explanation methods. SmoothGrad adds Gaussian noise to mitigate much of this noise. However, the crucial hyperparameter in this method, the variance ฯƒ of the Gaussian noise, is often set manually or determined using a heuristic approach. This results in the smoothed gradients containing extra noise introduced by the smoothing process. In this paper, we aim to analyze the noise and its connection to the out-of-range sampling in the smoothing process of SmoothGrad. Based on this insight, we propose AdaptGrad, an adaptive gradient smoothing method that controls out-of-range sampling to minimize noise. Comprehensive experiments, both qualitative and quantitative, demonstrate that AdaptGrad could effectively reduce almost all the noise in vanilla gradients compared to baseline methods. AdaptGrad is simple and universal, making it a practical solution to enhance gradient-based interpretability methods to achieve clearer visualization.


Collective Counterfactual Explanations: Balancing Individual Goals and Collective Dynamics

Neural Information Processing Systems

Counterfactual explanations provide individuals with cost-optimal recommendations to achieve their desired outcomes. However, when a significant number of individuals seek similar state modifications, this individual-centric approach can inadvertently create competition and introduce unforeseen costs. Additionally, disregarding the underlying data distribution may lead to recommendations that individuals perceive as unusual or impractical. To address these challenges, we propose a novel framework that extends standard counterfactual explanations by incorporating a population dynamics model. This framework penalizes deviations from equilibrium after individuals follow the recommendations, effectively mitigating externalities caused by correlated changes across the population. By balancing individual modification costs with their impact on others, our method ensures more equitable and efficient outcomes. We show how this approach reframes the counterfactual explanation problem from an individual-centric task to a collective optimization problem. Augmenting our theoretical insights, we design and implement scalable algorithms for computing collective counterfactuals, showcasing their effectiveness and advantages over existing recourse methods, particularly in aligning with collective objectives.


X: Shapelet-Driven Post Hoc Explanations for Time Series Classification Models

Neural Information Processing Systems

Explaining time series classification models is crucial, particularly in high-stakes applications such as healthcare and finance, where transparency and trust play a critical role. Although numerous time series classification methods have identified key subsequences, known as shapelets, as core features for achieving stateof-the-art performance and validating their pivotal role in classification outcomes, existing post-hoc time series explanation (PHTSE) methods primarily focus on timestep-level feature attribution. These explanation methods overlook the fundamental prior that classification outcomes are predominantly driven by key shapelets.


DiCoFlex: Model-Agnostic Diverse Counterfactuals with Flexible Control

Neural Information Processing Systems

Counterfactual explanations play a pivotal role in explainable artificial intelligence (XAI) by offering intuitive, human-understandable alternatives that elucidate machine learning model decisions. Despite their significance, existing methods for generating counterfactuals often require constant access to the predictive model, involve computationally intensive optimization for each instance, and lack the flexibility to adapt to new user-defined constraints without retraining. In this paper, we propose DiCoFlex, a novel model-agnostic, conditional generative framework that produces multiple diverse counterfactuals in a single forward pass. Leveraging conditional normalizing flows trained solely on labeled data, DiCoFlex addresses key limitations by enabling real-time, user-driven customization of constraints such as sparsity and actionability at inference time. Extensive experiments on standard benchmark datasets show that DiCoFlex outperforms existing methods in terms of validity, diversity, proximity, and constraint adherence, making it a practical and scalable solution for counterfactual generation in sensitive decision-making domains.


MIX: A Multi-view Time-Frequency Interactive Explanation Framework for Time Series Classification

Neural Information Processing Systems

Deep learning models for time series classification (TSC) have achieved impressive performance, but explaining their decisions remains a significant challenge. Existing post-hoc explanation methods typically operate solely in the time domain and from a single-view perspective, limiting both faithfulness and robustness. In this work, we propose MIX (Multi-view Time-Frequency Interactive EXplanation Framework), a novel framework that helps to explain deep learning models in a multi-view setting by leveraging multi-resolution, time-frequency views constructed using the Haar Discrete Wavelet Transform (DWT). MIX introduces an interactive cross-view refinement scheme, where explanation's information from one view is propagated across views to enhance overall interpretability. To align with user-preferred perspectives, we propose a greedy selection strategy that traverses the multi-view space to identify the most informative features. Additionally, we present OSIGV, a user-aligned segment-level attribution mechanism based on overlapping windows for each view, and introduce keystone-first IG, a method that refines explanations in each view using additional information from another view. Extensive experiments across multiple TSC benchmarks and model architectures demonstrate that MIX significantly outperforms state-of-the-art (SOTA) methods in terms of explanation faithfulness and robustness.


Explainable AI Isn't Enough! Rethinking Algorithmic Contestability

arXiv.org Machine Learning

Machine learning systems increasingly make life-changing decisions about individuals, such as loan approvals, hiring, and cheating detection, raising a pressing question: how can individuals respond to negative decisions made by these opaque systems? While explainable artificial intelligence (XAI) has largely focused on algorithmic recourse -- helping individuals change their features to obtain a desired outcome -- the parallel problem of algorithmic contestability -- helping individuals review and correct erroneous algorithmic decisions -- has received far less attention, despite its central ethical and legal importance. We trace this neglect to the absence of clear formal definitions and a systematic operationalization of contestability as an algorithmic problem. To address it, we propose an operational definition of contestability as a natural complement to recourse: contestability starts from the presumption that a decision may be incorrect and focuses on identifying evidence to challenge and potentially overturn it, whereas recourse assumes the decision is valid and instead provides pathways for changing it. We show that standard XAI explanations, such as counterfactuals, LIME, or Anchors, even when combined with human intuitions about decision continuity or monotonicity, reveal only errors in the neighborhood of the individual, but provide insufficient grounds for overturning the decision at hand. Going thus beyond traditional XAI, we identify three types of evidence warranting reversal according to the decision maker's own ethical standards: predictive multiplicity, incorrect feature values, and neglected overruling evidence. We argue that these render decisions normatively indefensible and thus successfully contestable. Finally, we analyze how existing EU legislation connects to our framework and argue that individuals already hold some legal rights to these forms of evidence.


Explaining Preferences with Shapley Values Robert Hu

Neural Information Processing Systems

While preference modelling is becoming one of the pillars of machine learning, the problem of preference explanation remains challenging and underexplored. In this paper, we propose PREF-SHAP, a Shapley value-based model explanation framework for pairwise comparison data. We derive the appropriate value functions for preference models and further extend the framework to model and explain context specific information, such as the surface type in a tennis game. To demonstrate the utility of PREF-SHAP, we apply our method to a variety of synthetic and real-world datasets and show that richer and more insightful explanations can be obtained over the baseline.


The Utility of Explainable AI in Ad Hoc Human-Machine Teaming Supplmentary

Neural Information Processing Systems

D.2 Study 2: Additional Analysis Details Assessing a human-machine team's time-to-build, we test for normality and homoschedascity and do not reject the null hypothesis in either case, using Shapiro-Wilk (p > 0.05) and Levene's Test (p>0.7). We find a significant effect between a participant's teaming ability and the participant's build speed (F(1,26) = 23.5;p


Counterfactual Explanations Can Be Manipulated

Neural Information Processing Systems

Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilties of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.1