Goto

Collaborating Authors

 Explanation & Argumentation


Diffusion Visual Counterfactual Explanations

Neural Information Processing Systems

Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are "small" but "realistic" semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process. Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification. Second, our cone regularization via an adversarially robust model ensures that the diffusion process does not converge to trivial non-semantic changes, but instead produces realistic images of the target class which achieve high confidence by the classifier.


On Explaining Unfairness: An Overview

arXiv.org Artificial Intelligence

Algorithmic fairness and explainability are foundational elements for achieving responsible AI. In this paper, we focus on their interplay, a research area that is recently receiving increasing attention. To this end, we first present two comprehensive taxonomies, each representing one of the two complementary fields of study: fairness and explanations. Then, we categorize explanations for fairness into three types: (a) Explanations to enhance fairness metrics, (b) Explanations to help us understand the causes of (un)fairness, and (c) Explanations to assist us in designing methods for mitigating unfairness. Finally, based on our fairness and explanation taxonomies, we present undiscovered literature paths revealing gaps that can serve as valuable insights for future research.


Feature Attribution with Necessity and Sufficiency via Dual-stage Perturbation Test for Causal Explanation

arXiv.org Artificial Intelligence

We investigate the problem of explainability in machine learning.To address this problem, Feature Attribution Methods (FAMs) measure the contribution of each feature through a perturbation test, where the difference in prediction is compared under different perturbations.However, such perturbation tests may not accurately distinguish the contributions of different features, when their change in prediction is the same after perturbation.In order to enhance the ability of FAMs to distinguish different features' contributions in this challenging setting, we propose to utilize the probability (PNS) that perturbing a feature is a necessary and sufficient cause for the prediction to change as a measure of feature importance.Our approach, Feature Attribution with Necessity and Sufficiency (FANS), computes the PNS via a perturbation test involving two stages (factual and interventional).In practice, to generate counterfactual samples, we use a resampling-based approach on the observed samples to approximate the required conditional distribution.Finally, we combine FANS and gradient-based optimization to extract the subset with the largest PNS.We demonstrate that FANS outperforms existing feature attribution methods on six benchmarks.


The Effect of Data Poisoning on Counterfactual Explanations

arXiv.org Artificial Intelligence

Counterfactual explanations provide a popular method for analyzing the predictions of black-box systems, and they can offer the opportunity for computational recourse by suggesting actionable changes on how to change the input to obtain a different (i.e. more favorable) system output. However, recent work highlighted their vulnerability to different types of manipulations. This work studies the vulnerability of counterfactual explanations to data poisoning. We formalize data poisoning in the context of counterfactual explanations for increasing the cost of recourse on three different levels: locally for a single instance, or a sub-group of instances, or globally for all instances. We demonstrate that state-of-the-art counterfactual generation methods \& toolboxes are vulnerable to such data poisoning.


One-for-many Counterfactual Explanations by Column Generation

arXiv.org Artificial Intelligence

In recent years, machine learning algorithms have been used in high-stakes decision-making settings, such as healthcare, loan approval, or parole decisions (Baesens et al., 2003; Zeng et al., 2022, 2017). Consequently, there is a growing interest and necessity in their explainability and interpretability (Du et al., 2019; Jung et al., 2020; Molnar et al., 2020; Rudin et al., 2022; Zhang et al., 2019). Once a supervised classification model has been trained, one may be interested in knowing the changes needed to be made in the features of an instance to change the prediction made by the classifier. These changes are the so-called counterfactual explanations (Martens and Provost, 2014; Wachter et al., 2017). There is a growing literature on the development of algorithms to generate counterfactual explanations, see Artelt and Hammer (2019); Guidotti (2022); Karimi et al. (2022); Sokol and Flach (2019); Stepin et al. (2021); Verma et al. (2022) for recent surveys on Counterfactual Analysis. Nevertheless, they mainly focus on the single-instance, single-counterfactual case, where for one specific instance, a single counterfactual is provided (Wachter et al., 2017; Parmentier and Vidal, 2021).


ACTER: Diverse and Actionable Counterfactual Sequences for Explaining and Diagnosing RL Policies

arXiv.org Artificial Intelligence

Understanding how failure occurs and how it can be prevented in reinforcement learning (RL) is necessary to enable debugging, maintain user trust, and develop personalized policies. Counterfactual reasoning has often been used to assign blame and understand failure by searching for the closest possible world in which the failure is avoided. However, current counterfactual state explanations in RL can only explain an outcome using just the current state features and offer no actionable recourse on how a negative outcome could have been prevented. In this work, we propose ACTER (Actionable Counterfactual Sequences for Explaining Reinforcement Learning Outcomes), an algorithm for generating counterfactual sequences that provides actionable advice on how failure can be avoided. ACTER investigates actions leading to a failure and uses the evolutionary algorithm NSGA-II to generate counterfactual sequences of actions that prevent it with minimal changes and high certainty even in stochastic environments. Additionally, ACTER generates a set of multiple diverse counterfactual sequences that enable users to correct failure in the way that best fits their preferences. We also introduce three diversity metrics that can be used for evaluating the diversity of counterfactual sequences. We evaluate ACTER in two RL environments, with both discrete and continuous actions, and show that it can generate actionable and diverse counterfactual sequences. We conduct a user study to explore how explanations generated by ACTER help users identify and correct failure.


Explainable AI for Safe and Trustworthy Autonomous Driving: A Systematic Review

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) shows promising applications for the perception and planning tasks in autonomous driving (AD) due to its superior performance compared to conventional methods. However, inscrutable AI systems exacerbate the existing challenge of safety assurance of AD. One way to mitigate this challenge is to utilize explainable AI (XAI) techniques. To this end, we present the first comprehensive systematic literature review of explainable methods for safe and trustworthy AD. We begin by analyzing the requirements for AI in the context of AD, focusing on three key aspects: data, model, and agency. We find that XAI is fundamental to meeting these requirements. Based on this, we explain the sources of explanations in AI and describe a taxonomy of XAI. We then identify five key contributions of XAI for safe and trustworthy AI in AD, which are interpretable design, interpretable surrogate models, interpretable monitoring, auxiliary explanations, and interpretable validation. Finally, we propose a modular framework called SafeX to integrate these contributions, enabling explanation delivery to users while simultaneously ensuring the safety of AI models.


Game-theoretic Counterfactual Explanation for Graph Neural Networks

arXiv.org Artificial Intelligence

Graph Neural Networks (GNNs) have been a powerful tool for node classification tasks in complex networks. However, their decision-making processes remain a black-box to users, making it challenging to understand the reasoning behind their predictions. Counterfactual explanations (CFE) have shown promise in enhancing the interpretability of machine learning models. Prior approaches to compute CFE for GNNS often are learning-based approaches that require training additional graphs. In this paper, we propose a semivalue-based, non-learning approach to generate CFE for node classification tasks, eliminating the need for any additional training. Our results reveals that computing Banzhaf values requires lower sample complexity in identifying the counterfactual explanations compared to other popular methods such as computing Shapley values. Our empirical evidence indicates computing Banzhaf values can achieve up to a fourfold speed up compared to Shapley values. We also design a thresholding method for computing Banzhaf values and show theoretical and empirical results on its robustness in noisy environments, making it superior to Shapley values. Furthermore, the thresholded Banzhaf values are shown to enhance efficiency without compromising the quality (i.e., fidelity) in the explanations in three popular graph datasets.


Interpretable classifiers for tabular data via discretization and feature selection

arXiv.org Artificial Intelligence

Explainability and human interpretability are becoming an increasingly important part of research on machine learning. In addition to the immediate benefits of explanations and interpretability in scientific contexts, the capacity to provide explanations behind automated decisions has already been widely addressed also on the level of legislation. For example, the European General Data Protection Regulation [8] and California Consumer Privacy Act [4] both refer to the right of individuals to get explanations of automated decisions concerning them. This article investigates interpretability in the framework of tabular data. Tabular data is highly important for numerous scientific and real-life contexts, often even regarded as the most important form of data: see, e.g., [22, 2]. The aim of the current article is to introduce an efficient method for extracting highly interpretable binary classifiers from tabular data. While explainable AI (or XAI) methods custom-made for pictures and text cannot be readily used in the setting of tabular data [16], numerous succesful XAI methods for tabular data exist. See the survey [20] for an overview of XAI in relation to tabular data. The authors are given in the alphabetical order.


Advancing Explainable AI Toward Human-Like Intelligence: Forging the Path to Artificial Brain

arXiv.org Artificial Intelligence

The intersection of Artificial Intelligence (AI) and neuroscience in Explainable AI (XAI) is pivotal for enhancing transparency and interpretability in complex decision-making processes. This paper explores the evolution of XAI methodologies, ranging from feature-based to human-centric approaches, and delves into their applications in diverse domains, including healthcare and finance. The challenges in achieving explainability in generative models, ensuring responsible AI practices, and addressing ethical implications are discussed. The paper further investigates the potential convergence of XAI with cognitive sciences, the development of emotionally intelligent AI, and the quest for Human-Like Intelligence (HLI) in AI systems. As AI progresses towards Artificial General Intelligence (AGI), considerations of consciousness, ethics, and societal impact become paramount. The ongoing pursuit of deciphering the mysteries of the brain with AI and the quest for HLI represent transformative endeavors, bridging technical advancements with multidisciplinary explorations of human cognition.