Explanation & Argumentation
ExPUNations: Augmenting Puns with Keywords and Explanations
Sun, Jiao, Narayan-Chen, Anjali, Oraby, Shereen, Cervone, Alessandra, Chung, Tagyoung, Huang, Jing, Liu, Yang, Peng, Nanyun
The tasks of humor understanding and generation are challenging and subjective even for humans, requiring commonsense and real-world knowledge to master. Puns, in particular, add the challenge of fusing that knowledge with the ability to interpret lexical-semantic ambiguity. In this paper, we present the ExPUNations (ExPUN) dataset, in which we augment an existing dataset of puns with detailed crowdsourced annotations of keywords denoting the most distinctive words that make the text funny, pun explanations describing why the text is funny, and fine-grained funniness ratings. This is the first humor dataset with such extensive and fine-grained annotations specifically for puns. Based on these annotations, we propose two tasks: explanation generation to aid with pun classification and keyword-conditioned pun generation, to challenge the current state-of-the-art natural language understanding and generation models' ability to understand and generate humor. We showcase that the annotated keywords we collect are helpful for generating better novel humorous texts in human evaluation, and that our natural language explanations can be leveraged to improve both the accuracy and robustness of humor classifiers.
Full-Text Argumentation Mining on Scientific Publications
Binder, Arne, Verma, Bhuvanesh, Hennig, Leonhard
Scholarly Argumentation Mining (SAM) has recently gained attention due to its potential to help scholars with the rapid growth of published scientific literature. It comprises two subtasks: argumentative discourse unit recognition (ADUR) and argumentative relation extraction (ARE), both of which are challenging since they require e.g. the integration of domain knowledge, the detection of implicit statements, and the disambiguation of argument structure. While previous work focused on dataset construction and baseline methods for specific document sections, such as abstract or results, full-text scholarly argumentation mining has seen little progress. In this work, we introduce a sequential pipeline model combining ADUR and ARE for full-text SAM, and provide a first analysis of the performance of pretrained language models (PLMs) on both subtasks. We establish a new SotA for ADUR on the Sci-Arg corpus, outperforming the previous best reported result by a large margin (+7% F1). We also present the first results for ARE, and thus for the full AM pipeline, on this benchmark dataset. Our detailed error analysis reveals that non-contiguous ADUs as well as the interpretation of discourse connectors pose major challenges and that data annotation needs to be more consistent.
eBook: Intuitive Machine Learning and Explainable AI - Machine Learning Techniques
By Vincent Granville Ph.D. Published in September 2022. This book covers the foundations of machine learning, with modern approaches to solving complex problems. Emphasis is on scalability, automation, testing, optimizing, and interpretability (explainable AI). For instance, regression techniques -- including logistic and Lasso -- are presented as a single method, without using advanced linear algebra. There is no need to learn 50 versions when one does it all and more.
The privacy issue of counterfactual explanations: explanation linkage attacks
Goethals, Sofie, Sörensen, Kenneth, Martens, David
Black-box machine learning models are being used in more and more high-stakes domains, which creates a growing need for Explainable AI (XAI). Unfortunately, the use of XAI in machine learning introduces new privacy risks, which currently remain largely unnoticed. We introduce the explanation linkage attack, which can occur when deploying instance-based strategies to find counterfactual explanations. To counter such an attack, we propose k-anonymous counterfactual explanations and introduce pureness as a new metric to evaluate the validity of these k-anonymous counterfactual explanations. Our results show that making the explanations, rather than the whole dataset, k- anonymous, is beneficial for the quality of the explanations.
Diffusion Visual Counterfactual Explanations
Augustin, Maximilian, Boreiko, Valentyn, Croce, Francesco, Hein, Matthias
Visual Counterfactual Explanations (VCEs) are an important tool to understand the decisions of an image classifier. They are "small" but "realistic" semantic changes of the image changing the classifier decision. Current approaches for the generation of VCEs are restricted to adversarially robust models and often contain non-realistic artefacts, or are limited to image classification problems with few classes. In this paper, we overcome this by generating Diffusion Visual Counterfactual Explanations (DVCEs) for arbitrary ImageNet classifiers via a diffusion process. Two modifications to the diffusion process are key for our DVCEs: first, an adaptive parameterization, whose hyperparameters generalize across images and models, together with distance regularization and late start of the diffusion process, allow us to generate images with minimal semantic changes to the original ones but different classification. Second, our cone regularization via an adversarially robust model ensures that the diffusion process does not converge to trivial non-semantic changes, but instead produces realistic images of the target class which achieve high confidence by the classifier.
Modelling Control Arguments via Cooperation Logic in Unforeseen Scenarios
The intent of control argumentation frameworks is to specifically model strategic scenarios from the perspective of an agent by extending the standard model of argumentation framework in a way that takes unquantified uncertainty regarding arguments and attacks into account. They do not, however, adequately account for coalition formation and interactions among a set of agents in an uncertain environment. To address this challenge, we propose a formalism of a multi-agent scenario via cooperation logic and investigate agents' strategies and actions in a dynamic environment.
Augmentation by Counterfactual Explanation -- Fixing an Overconfident Classifier
Singla, Sumedha, Murali, Nihal, Arabshahi, Forough, Triantafyllou, Sofia, Batmanghelich, Kayhan
A highly accurate but overconfident model is ill-suited for deployment in critical applications such as healthcare and autonomous driving. The classification outcome should reflect a high uncertainty on ambiguous in-distribution samples that lie close to the decision boundary. The model should also refrain from making overconfident decisions on samples that lie far outside its training distribution, far-out-of-distribution (far-OOD), or on unseen samples from novel classes that lie near its training distribution (near-OOD). This paper proposes an application of counterfactual explanations in fixing an over-confident classifier. Specifically, we propose to fine-tune a given pre-trained classifier using augmentations from a counterfactual explainer (ACE) to fix its uncertainty characteristics while retaining its predictive performance. We perform extensive experiments with detecting far-OOD, near-OOD, and ambiguous samples. Our empirical results show that the revised model have improved uncertainty measures, and its performance is competitive to the state-of-the-art methods.
Counterfactual Explanations for Reinforcement Learning
Gajcin, Jasmina, Dusparic, Ivana
While AI algorithms have shown remarkable success in various fields, their lack of transparency hinders their application to real-life tasks. Although explanations targeted at non-experts are necessary for user trust and human-AI collaboration, the majority of explanation methods for AI are focused on developers and expert users. Counterfactual explanations are local explanations that offer users advice on what can be changed in the input for the output of the black-box model to change. Counterfactuals are user-friendly and provide actionable advice for achieving the desired output from the AI system. While extensively researched in supervised learning, there are few methods applying them to reinforcement learning (RL). In this work, we explore the reasons for the underrepresentation of a powerful explanation method in RL. We start by reviewing the current work in counterfactual explanations in supervised learning. Additionally, we explore the differences between counterfactual explanations in supervised learning and RL and identify the main challenges that prevent adoption of methods from supervised in reinforcement learning. Finally, we redefine counterfactuals for RL and propose research directions for implementing counterfactuals in RL.
Towards Procedural Fairness: Uncovering Biases in How a Toxic Language Classifier Uses Sentiment Information
Nejadgholi, Isar, Balkır, Esma, Fraser, Kathleen C., Kiritchenko, Svetlana
Previous works on the fairness of toxic language classifiers compare the output of models with different identity terms as input features but do not consider the impact of other important concepts present in the context. Here, besides identity terms, we take into account high-level latent features learned by the classifier and investigate the interaction between these features and identity terms. For a multi-class toxic language classifier, we leverage a concept-based explanation framework to calculate the sensitivity of the model to the concept of sentiment, which has been used before as a salient feature for toxic language detection. Our results show that although for some classes, the classifier has learned the sentiment information as expected, this information is outweighed by the influence of identity terms as input features. This work is a step towards evaluating procedural fairness, where unfair processes lead to unfair outcomes. The produced knowledge can guide debiasing techniques to ensure that important concepts besides identity terms are well-represented in training datasets.
Black Box Model Explanations and the Human Interpretability Expectations -- An Analysis in the Context of Homicide Prediction
Ribeiro, José, Carneiro, Níkolas, Alves, Ronnie
Strategies based on Explainable Artificial Intelligence - XAI have promoted better human interpretability of the results of black box machine learning models. This sets a precedent for questioning whether or not human expectations are being met when faced with the explanations of this type of model. The XAI measures being currently used (Ciu, Dalex, Eli5, Lofo, Shap, and Skater) provide various forms of explanations, including global rankings of relevance of attributes, which allow for an overview of how the model is explained as a result of its inputs and outputs. These measures provide for an increase in the explainability of the model and a greater interpretability grounded on the context of the problem. Current research points to the need for further studies (within a specific context/problem) on how these explanations meet the Interpretability Expectations of human experts and how they can be used to make the model even more transparent while taking into account specific complexities of the model and dataset being analyzed, as well as important human factors of sensitive real-world contexts/problems. Intending to shed light on the explanations generated by XAI measures and their interpretabilities, this research addresses a real-world classification problem related to homicide prediction, duly endorsed by the scientific community, replicated its proposed black box model and used 6 different XAI measures to generate explanations and 6 different human experts to generate what this research referred to as Interpretability Expectations - IE. The results were computed by means of comparative analysis and identification of relationships among all the attribute ranks produced, and 49% concordance was found among attributes indicated by means of XAI measures and human experts, 41% exclusively by XAI measures and 10% exclusively by human experts. The results allow for answering questions such as: "Do the different XAI measures generate similar explanations for the proposed problem?", "Are the interpretability expectations generated among different human experts similar?","Do the