Explanation & Argumentation
Counterfactual Explanation Policies in RL
Deshmukh, Shripad V., R, Srivatsan, Vijay, Supriti, Subramanian, Jayakumar, Agarwal, Chirag
As Reinforcement Learning (RL) agents are increasingly employed in diverse decision-making problems using reward preferences, it becomes important to ensure that policies learned by these frameworks in mapping observations to a probability distribution of the possible actions are explainable. However, there is little to no work in the systematic understanding of these complex policies in a contrastive manner, i.e., what minimal changes to the policy would improve/worsen its performance to a desired level. In this work, we present COUNTERPOL, the first framework to analyze RL policies using counterfactual explanations in the form of minimal changes to the policy that lead to the desired outcome. We do so by incorporating counterfactuals in supervised learning in RL with the target outcome regulated using desired return. We establish a theoretical connection between Counterpol and widely used trust region-based policy optimization methods in RL. Extensive empirical analysis shows the efficacy of COUNTERPOL in generating explanations for (un)learning skills while keeping close to the original policy. Our results on five different RL environments with diverse state and action spaces demonstrate the utility of counterfactual explanations, paving the way for new frontiers in designing and developing counterfactual policies.
A New Class of Explanations for Classifiers with Non-Binary Features
Two types of explanations have been receiving increased attention in the literature when analyzing the decisions made by classifiers. The first type explains why a decision was made and is known as a sufficient reason for the decision, also an abductive explanation or a PI-explanation. The second type explains why some other decision was not made and is known as a necessary reason for the decision, also a contrastive or counterfactual explanation. These explanations were defined for classifiers with binary, discrete and, in some cases, continuous features. We show that these explanations can be significantly improved in the presence of non-binary features, leading to a new class of explanations that relay more information about decisions and the underlying classifiers. Necessary and sufficient reasons were also shown to be the prime implicates and implicants of the complete reason for a decision, which can be obtained using a quantification operator. We show that our improved notions of necessary and sufficient reasons are also prime implicates and implicants but for an improved notion of complete reason obtained by a new quantification operator that we also define and study.
Modifications of the Miller definition of contrastive (counterfactual) explanations
Miller recently proposed a definition of contrastive (counterfactual) explanations based on the well-known Halpern-Pearl (HP) definitions of causes and (non-contrastive) explanations. Crucially, the Miller definition was based on the original HP definition of explanations, but this has since been modified by Halpern; presumably because the original yields counterintuitive results in many standard examples. More recently Borner has proposed a third definition, observing that this modified HP definition may also yield counterintuitive results. In this paper we show that the Miller definition inherits issues found in the original HP definition. We address these issues by proposing two improved variants based on the more robust modified HP and Borner definitions. We analyse our new definitions and show that they retain the spirit of the Miller definition where all three variants satisfy an alternative unified definition that is modular with respect to an underlying definition of non-contrastive explanations. To the best of our knowledge this paper also provides the first explicit comparison between the original and modified HP definitions.
A Model to Support Collective Reasoning: Formalization, Analysis and Computational Assessment
Ganzer, Jordi (King's College London) | Criado, Natalia (King's College London) | Lopez-Sanchez, Maite (University of Barcelona) | Parsons, Simon (University of Lincoln) | Rodriguez-Aguilar, Juan A. (Institut d'Investigaciรณ en Intelยทligรจncia Artificial (IIIA-CSIC))
In this paper we propose a new model to represent human debates and methods to obtain collective conclusions from them. This model overcomes two drawbacks of existing approaches. First, our model does not assume that participants agree on the structure of the debate. It does this by allowing participants to express their opinion about all aspects of the debate. Second, our model does not assume that participants' opinions are rational, an assumption that significantly limits current approaches. Instead, we define a weaker notion of rationality that characterises coherent opinions, and we consider different scenarios based on the coherence of individual opinions and the level of consensus. We provide a formal analysis of different opinion aggregation functions that compute a collective decision based on the individual opinions and the debate structure. In particular, we demonstrate that aggregated opinions can be coherent even if there is a lack of consensus and individual opinions are not coherent. We conclude with an empirical evaluation demonstrating that collective opinions can be computed efficiently for real-sized debates.
TbExplain: A Text-based Explanation Method for Scene Classification Models with the Statistical Prediction Correction
Aminimehr, Amirhossein, Khani, Pouya, Molaei, Amirali, Kazemeini, Amirmohammad, Cambria, Erik
The field of Explainable Artificial Intelligence (XAI) aims to improve the interpretability of black-box machine learning models. Building a heatmap based on the importance value of input features is a popular method for explaining the underlying functions of such models in producing their predictions. Heatmaps are almost understandable to humans, yet they are not without flaws. Non-expert users, for example, may not fully understand the logic of heatmaps (the logic in which relevant pixels to the model's prediction are highlighted with different intensities or colors). Additionally, objects and regions of the input image that are relevant to the model prediction are frequently not entirely differentiated by heatmaps. In this paper, we propose a framework called TbExplain that employs XAI techniques and a pre-trained object detector to present text-based explanations of scene classification models. Moreover, TbExplain incorporates a novel method to correct predictions and textually explain them based on the statistics of objects in the input image when the initial prediction is unreliable. To assess the trustworthiness and validity of the text-based explanations, we conducted a qualitative experiment, and the findings indicated that these explanations are sufficiently reliable. Furthermore, our quantitative and qualitative experiments on TbExplain with scene classification datasets reveal an improvement in classification accuracy over ResNet variants.
Explainable Data-Driven Optimization: From Context to Decision and Back Again
Forel, Alexandre, Parmentier, Axel, Vidal, Thibaut
Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the classification setting, explaining decision pipelines involving learning algorithms remains unaddressed. This lack of interpretability can block the adoption of data-driven solutions as practitioners may not understand or trust the recommended decisions. We bridge this gap by introducing a counterfactual explanation methodology tailored to explain solutions to data-driven problems. We introduce two classes of explanations and develop methods to find nearest explanations of random forest and nearest-neighbor predictors. We demonstrate our approach by explaining key problems in operations management such as inventory management and routing.
Navigating Explanatory Multiverse Through Counterfactual Path Geometry
Sokol, Kacper, Small, Edward, Xuan, Yueqing
Counterfactual explanations are the de facto standard when tasked with interpreting decisions of (opaque) predictive models. Their generation is often subject to algorithmic and domain-specific constraints -- such as density-based feasibility and attribute (im)mutability or directionality of change -- that aim to maximise their real-life utility. In addition to desiderata with respect to the counterfactual instance itself, existence of a viable path connecting it with the factual data point, known as algorithmic recourse, has become an important technical consideration. While both of these requirements ensure that the steps of the journey as well as its destination are admissible, current literature neglects the multiplicity of such counterfactual paths. To address this shortcoming we introduce the novel concept of explanatory multiverse that encompasses all the possible counterfactual journeys; we then show how to navigate, reason about and compare the geometry of these trajectories -- their affinity, branching, divergence and possible future convergence -- with two methods: vector spaces and graphs. Implementing this (interactive) explanatory process grants explainees agency by allowing them to select counterfactuals based on the properties of the journey leading to them in addition to their absolute differences.
Is Task-Agnostic Explainable AI a Myth?
Our work serves as a framework for unifying the challenges of contemporary explainable AI (XAI). We demonstrate that while XAI methods provide supplementary and potentially useful output for machine learning models, researchers and decision-makers should be mindful of their conceptual and technical limitations, which frequently result in these methods themselves becoming black boxes. We examine three XAI research avenues spanning image, textual, and graph data, covering saliency, attention, and graph-type explainers. Despite the varying contexts and timeframes of the mentioned cases, the same persistent roadblocks emerge, highlighting the need for a conceptual breakthrough in the field to address the challenge of compatibility between XAI methods and application tasks.
On the Connection between Game-Theoretic Feature Attributions and Counterfactual Explanations
Albini, Emanuele, Sharma, Shubham, Mishra, Saumitra, Dervovic, Danial, Magazzeni, Daniele
Explainable Artificial Intelligence (XAI) has received widespread interest in recent years, and two of the most popular types of explanations are feature attributions, and counterfactual explanations. These classes of approaches have been largely studied independently and the few attempts at reconciling them have been primarily empirical. This work establishes a clear theoretical connection between game-theoretic feature attributions, focusing on but not limited to SHAP, and counterfactuals explanations. After motivating operative changes to Shapley values based feature attributions and counterfactual explanations, we prove that, under conditions, they are in fact equivalent. We then extend the equivalency result to game-theoretic solution concepts beyond Shapley values. Moreover, through the analysis of the conditions of such equivalence, we shed light on the limitations of naively using counterfactual explanations to provide feature importances. Experiments on three datasets quantitatively show the difference in explanations at every stage of the connection between the two approaches and corroborate the theoretical findings.
Equivalence in Argumentation Frameworks with a Claim-centric View: Classical Results with Novel Ingredients
Baumann, Ringo (Department of Computer Science, Leipzig University, Germany) | Rapberger, Anna (a:1:{s:5:"en_US";s:7:"TU Wien";}) | Ulbricht, Markus (Department of Computer Science, Leipzig University, Germany)
A common feature of non-monotonic logics is that the classical notion of equivalence does not preserve the intended meaning in light of additional information. Consequently, the term strong equivalence was coined in the literature and thoroughly investigated. In the present paper, the knowledge representation formalism under consideration is claim-augmented argumentation frameworks (CAFs) which provide a formal basis to analyze conclusion-oriented problems in argumentation by adapting a claim-focused perspective. CAFs extend Dung AFs by associating a claim to each argument representing its conclusion. In this paper, we investigate both ordinary and strong equivalence in CAFs. Thereby, we take the fact into account that one might either be interested in the actual arguments or their claims only. The former point of view naturally yields an extension of strong equivalence for AFs to the claim-based setting while the latter gives rise to a novel equivalence notion which is genuine for CAFs. We tailor, examine and compare these notions and obtain a comprehensive study of this matter for CAFs. We conclude by investigating the computational complexity of naturally arising decision problems.