Explanation & Argumentation
Counterfactual Explanations Can Be Manipulated
Counterfactual explanations are emerging as an attractive option for providing recourse to individuals adversely impacted by algorithmic decisions. As they are deployed in critical applications (e.g. law enforcement, financial lending), it becomes important to ensure that we clearly understand the vulnerabilties of these methods and find ways to address them. However, there is little understanding of the vulnerabilities and shortcomings of counterfactual explanations. In this work, we introduce the first framework that describes the vulnerabilities of counterfactual explanations and shows how they can be manipulated. More specifically, we show counterfactual explanations may converge to drastically different counterfactuals under a small perturbation indicating they are not robust. Leveraging this insight, we introduce a novel objective to train seemingly fair models where counterfactual explanations find much lower cost recourse under a slight perturbation. We describe how these models can unfairly provide low-cost recourse for specific subgroups in the data while appearing fair to auditors. We perform experiments on loan and violent crime prediction data sets where certain subgroups achieve up to 20x lower cost recourse under the perturbation. These results raise concerns regarding the dependability of current counterfactual explanation techniques, which we hope will inspire investigations in robust counterfactual explanations.
Axe the X in XAI: A Plea for Understandable AI
In a recent paper, Erasmus et al. (2021) defend the idea that the ambiguity of the term "explanation" in explainable AI (XAI) can be solved by adopting any of four different extant accounts of explanation in the philosophy of science: the Deductive Nomological, Inductive Statistical, Causal Mechanical, and New Mechanist models. In this chapter, I show that the authors' claim that these accounts can be applied to deep neural networks as they would to any natural phenomenon is mistaken. I also provide a more general argument as to why the notion of explainability as it is currently used in the XAI literature bears little resemblance to the traditional concept of scientific explanation. It would be more fruitful to use the label "understandable AI" to avoid the confusion that surrounds the goal and purposes of XAI. In the second half of the chapter, I argue for a pragmatic conception of understanding that is better suited to play the central role attributed to explanation in XAI. Following Kuorikoski & Ylikoski (2015), the conditions of satisfaction for understanding an ML system are fleshed out in terms of an agent's success in using the system, in drawing correct inferences from it.
Modeling the Quality of Dialogical Explanations
Alshomary, Milad, Lange, Felix, Booshehri, Meisam, Sengupta, Meghdut, Cimiano, Philipp, Wachsmuth, Henning
Explanations are pervasive in our lives. Mostly, they occur in dialogical form where an explainer discusses a concept or phenomenon of interest with an explainee. Leaving the explainee with a clear understanding is not straightforward due to the knowledge gap between the two participants. Previous research looked at the interaction of explanation moves, dialogue acts, and topics in successful dialogues with expert explainers. However, daily-life explanations often fail, raising the question of what makes a dialogue successful. In this work, we study explanation dialogues in terms of the interactions between the explainer and explainee and how they correlate with the quality of explanations in terms of a successful understanding on the explainee's side. In particular, we first construct a corpus of 399 dialogues from the Reddit forum Explain Like I am Five and annotate it for interaction flows and explanation quality. We then analyze the interaction flows, comparing them to those appearing in expert dialogues. Finally, we encode the interaction flows using two language models that can handle long inputs, and we provide empirical evidence for the effectiveness boost gained through the encoding in predicting the success of explanation dialogues.
Longitudinal Counterfactuals: Constraints and Opportunities
Asemota, Alexander, Hooker, Giles
Counterfactual explanations are a common approach to providing recourse to data subjects. However, current methodology can produce counterfactuals that cannot be achieved by the subject, making the use of counterfactuals for recourse difficult to justify in practice. Though there is agreement that plausibility is an important quality when using counterfactuals for algorithmic recourse, ground truth plausibility continues to be difficult to quantify. In this paper, we propose using longitudinal data to assess and improve plausibility in counterfactuals. In particular, we develop a metric that compares longitudinal differences to counterfactual differences, allowing us to evaluate how similar a counterfactual is to prior observed changes. Furthermore, we use this metric to generate plausible counterfactuals. Finally, we discuss some of the inherent difficulties of using counterfactuals for recourse.
Cultural Bias in Explainable AI Research: A Systematic Analysis
For synergistic interactions between humans and artificial intelligence (AI) systems, AI outputs often need to be explainable to people. Explainable AI (XAI) systems are commonly tested in human user studies. However, whether XAI researchers consider potential cultural differences in human explanatory needs remains unexplored. We highlight psychological research that found significant differences in human explanations between many people from Western, commonly individualist countries and people from non-Western, often collectivist countries. We argue that XAI research currently overlooks these variations and that many popular XAI designs implicitly and problematically assume that Western explanatory needs are shared cross-culturally. Additionally, we systematically reviewed over 200 XAI user studies and found that most studies did not consider relevant cultural variations, sampled only Western populations, but drew conclusions about human-XAI interactions more generally. We also analyzed over 30 literature reviews of XAI studies. Most reviews did not mention cultural differences in explanatory needs or flag overly broad cross-cultural extrapolations of XAI user study results. Combined, our analyses provide evidence of a cultural bias toward Western populations in XAI research, highlighting an important knowledge gap regarding how culturally diverse users may respond to widely used XAI systems that future work can and should address.
Introducing User Feedback-based Counterfactual Explanations (UFCE)
Suffian, Muhammad, Alonso-Moral, Jose M., Bogliolo, Alessandro
Machine learning models are widely used in real-world applications. However, their complexity makes it often challenging to interpret the rationale behind their decisions. Counterfactual explanations (CEs) have emerged as a viable solution for generating comprehensible explanations in eXplainable Artificial Intelligence (XAI). CE provides actionable information to users on how to achieve the desired outcome with minimal modifications to the input. However, current CE algorithms usually operate within the entire feature space when optimizing changes to turn over an undesired outcome, overlooking the identification of key contributors to the outcome and disregarding the practicality of the suggested changes. In this study, we introduce a novel methodology, that is named as user feedback-based counterfactual explanation (UFCE), which addresses these limitations and aims to bolster confidence in the provided explanations. UFCE allows for the inclusion of user constraints to determine the smallest modifications in the subset of actionable features while considering feature dependence, and evaluates the practicality of suggested changes using benchmark evaluation metrics. We conducted three experiments with five datasets, demonstrating that UFCE outperforms two well-known CE methods in terms of \textit{proximity}, \textit{sparsity}, and \textit{feasibility}. Reported results indicate that user constraints influence the generation of feasible CEs.
Towards Explainability and Fairness in Swiss Judgement Prediction: Benchmarking on a Multilingual Dataset
S, Santosh T. Y. S., Baumgartner, Nina, Stürmer, Matthias, Grabmair, Matthias, Niklaus, Joel
The assessment of explainability in Legal Judgement Prediction (LJP) systems is of paramount importance in building trustworthy and transparent systems, particularly considering the reliance of these systems on factors that may lack legal relevance or involve sensitive attributes. This study delves into the realm of explainability and fairness in LJP models, utilizing Swiss Judgement Prediction (SJP), the only available multilingual LJP dataset. We curate a comprehensive collection of rationales that `support' and `oppose' judgement from legal experts for 108 cases in German, French, and Italian. By employing an occlusion-based explainability approach, we evaluate the explainability performance of state-of-the-art monolingual and multilingual BERT-based LJP models, as well as models developed with techniques such as data augmentation and cross-lingual transfer, which demonstrated prediction performance improvement. Notably, our findings reveal that improved prediction performance does not necessarily correspond to enhanced explainability performance, underscoring the significance of evaluating models from an explainability perspective. Additionally, we introduce a novel evaluation framework, Lower Court Insertion (LCI), which allows us to quantify the influence of lower court information on model predictions, exposing current models' biases.
NLAS-multi: A Multilingual Corpus of Automatically Generated Natural Language Argumentation Schemes
Ruiz-Dolz, Ramon, Taverner, Joaquin, Lawrence, John, Reed, Chris
Some of the major limitations identified in the areas of argument mining, argument generation, and natural language argument analysis are related to the complexity of annotating argumentatively rich data, the limited size of these corpora, and the constraints that represent the different languages and domains in which these data is annotated. To address these limitations, in this paper we present the following contributions: (i) an effective methodology for the automatic generation of natural language arguments in different topics and languages, (ii) the largest publicly available corpus of natural language argumentation schemes, and (iii) a set of solid baselines and fine-tuned models for the automatic identification of argumentation schemes.
Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing
Höhl, Adrian, Obadic, Ivica, Torres, Miguel Ángel Fernández, Najjar, Hiba, Oliveira, Dario, Akata, Zeynep, Dengel, Andreas, Zhu, Xiao Xiang
In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in Remote Sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the used explainable AI methods and their objectives, findings, and challenges in Remote Sensing applications is still missing. In this paper, we address this issue by performing a systematic review to identify the key trends of how explainable AI is used in Remote Sensing and shed light on novel explainable AI approaches and emerging directions that tackle specific Remote Sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights in Remote Sensing, and reflect on the approaches used for explainable AI methods evaluation. Our review provides a complete summary of the state-of-the-art in the field. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field of explainable AI in Remote Sensing.
XRL-Bench: A Benchmark for Evaluating and Comparing Explainable Reinforcement Learning Techniques
Xiong, Yu, Hu, Zhipeng, Huang, Ye, Wu, Runze, Guan, Kai, Fang, Xingchen, Jiang, Ji, Zhou, Tianze, Hu, Yujing, Liu, Haoyu, Lyu, Tangjie, Fan, Changjie
Reinforcement Learning (RL) has demonstrated substantial potential across diverse fields, yet understanding its decision-making process, especially in real-world scenarios where rationality and safety are paramount, is an ongoing challenge. This paper delves in to Explainable RL (XRL), a subfield of Explainable AI (XAI) aimed at unravelling the complexities of RL models. Our focus rests on state-explaining techniques, a crucial subset within XRL methods, as they reveal the underlying factors influencing an agent's actions at any given time. Despite their significant role, the lack of a unified evaluation framework hinders assessment of their accuracy and effectiveness. To address this, we introduce XRL-Bench, a unified standardized benchmark tailored for the evaluation and comparison of XRL methods, encompassing three main modules: standard RL environments, explainers based on state importance, and standard evaluators. XRL-Bench supports both tabular and image data for state explanation. We also propose TabularSHAP, an innovative and competitive XRL method. We demonstrate the practical utility of TabularSHAP in real-world online gaming services and offer an open-source benchmark platform for the straightforward implementation and evaluation of XRL methods. Our contributions facilitate the continued progression of XRL technology.