Adversarial images highlight how vulnerable modern image classifiers are to perturbations outside of their training set. Human oversight might mitigate this weakness, but depends on humans understanding the AI well enough to predict when it is likely to make a mistake. In previous work we have found that humans tend to assume that the AI's decision process mirrors their own. Here we evaluate if methods from explainable AI can disrupt this assumption to help participants predict AI classifications for adversarial and standard images. We find that both saliency maps and examples facilitate catching AI errors, but their effects are not additive, and saliency maps are more effective than examples.
Among different ML models, Deep Neural Networks (DNNs)  are well-known and widely used for their powerful representation learning from high-dimensional data such as images, texts, and speech. However, as ML algorithms enter sensitive real-world domains with trustworthiness, safety, and fairness prerequisites, the need for corresponding techniques and metrics for high-stake domains is more noticeable than before. Hence, researchers in different fields propose guidelines for Trustworthy AI , Safe AI , and Explainable AI  as stepping stones for next generation Responsible AI [6, 247]. Furthermore, government reports and regulations on AI accountability , trustworthiness , and safety  are gradually creating mandating laws to protect citizens' data privacy, fair data processing, and upholding safety for AI-based products. The development and deployment of ML algorithms for open-world tasks come with reliability and dependability limitations rooting from model performance, robustness, and uncertainty limitations . Unlike traditional code-based software, ML models have fundamental safety drawbacks, including performance limitations on their training set and run-time robustness in their operational domain.
Delseny, Hervé, Gabreau, Christophe, Gauffriau, Adrien, Beaudouin, Bernard, Ponsolle, Ludovic, Alecu, Lucian, Bonnin, Hugues, Beltran, Brice, Duchel, Didier, Ginestet, Jean-Brice, Hervieu, Alexandre, Martinez, Ghilaine, Pasquet, Sylvain, Delmas, Kevin, Pagetti, Claire, Gabriel, Jean-Marc, Chapdelaine, Camille, Picard, Sylvaine, Damour, Mathieu, Cappi, Cyril, Gardès, Laurent, De Grancey, Florence, Jenn, Eric, Lefevre, Baptiste, Flandin, Gregory, Gerchinovitz, Sébastien, Mamalet, Franck, Albore, Alexandre
Machine Learning (ML) seems to be one of the most promising solution to automate partially or completely some of the complex tasks currently realized by humans, such as driving vehicles, recognizing voice, etc. It is also an opportunity to implement and embed new capabilities out of the reach of classical implementation techniques. However, ML techniques introduce new potential risks. Therefore, they have only been applied in systems where their benefits are considered worth the increase of risk. In practice, ML techniques raise multiple challenges that could prevent their use in systems submitted to certification constraints. But what are the actual challenges? Can they be overcome by selecting appropriate ML techniques, or by adopting new engineering or certification practices? These are some of the questions addressed by the ML Certification 3 Workgroup (WG) set-up by the Institut de Recherche Technologique Saint Exup\'ery de Toulouse (IRT), as part of the DEEL Project.
State-of-the-art deep-learning systems use decision rules that are challenging for humans to model. Explainable AI (XAI) attempts to improve human understanding but rarely accounts for how people typically reason about unfamiliar agents. We propose explicitly modeling the human explainee via Bayesian Teaching, which evaluates explanations by how much they shift explainees' inferences toward a desired goal. We assess Bayesian Teaching in a binary image classification task across a variety of contexts. Absent intervention, participants predict that the AI's classifications will match their own, but explanations generated by Bayesian Teaching improve their ability to predict the AI's judgements by moving them away from this prior belief. Bayesian Teaching further allows each case to be broken down into sub-examples (here saliency maps). These sub-examples complement whole examples by improving error detection for familiar categories, whereas whole examples help predict correct AI judgements of unfamiliar cases.
Explainable Artificial Intelligence (XAI) has in recent years become a well-suited framework to generate human understandable explanations of black box models. In this paper, we present a novel XAI visual explanation algorithm denoted SIDU that can effectively localize entire object regions responsible for prediction in a full extend. We analyze its robustness and effectiveness through various computational and human subject experiments. In particular, we assess the SIDU algorithm using three different types of evaluations (Application, Human and Functionally-Grounded) to demonstrate its superior performance. The robustness of SIDU is further studied in presence of adversarial attack on black box models to better understand its performance.
With the ever-growing achievements in Artificial Intelligence (AI) and the recent boosted enthusiasm in Financial Technology (FinTech), applications such as credit scoring have gained substantial academic interest. Credit scoring helps financial experts make better decisions regarding whether or not to accept a loan application, such that loans with a high probability of default are not accepted. Apart from the noisy and highly imbalanced data challenges faced by such credit scoring models, recent regulations such as the right to explanation' introduced by the General Data Protection Regulation (GDPR) and the Equal Credit Opportunity Act (ECOA) have added the need for model interpretability to ensure that algorithmic decisions are understandable and coherent. An interesting concept that has been recently introduced is eXplainable AI (XAI), which focuses on making black-box models more interpretable. In this work, we present a credit scoring model that is both accurate and interpretable. For classification, state-of-the-art performance on the Home Equity Line of Credit (HELOC) and Lending Club (LC) Datasets is achieved using the Extreme Gradient Boosting (XGBoost) model. The model is then further enhanced with a 360-degree explanation framework, which provides different explanations (i.e. Evaluation through the use of functionallygrounded, application-grounded and human-grounded analysis show that the explanations provided are simple, consistent as well as satisfy the six predetermined hypotheses testing for correctness, effectiveness, easy understanding, detail sufficiency and trustworthiness. Credit scoring models are decision models that help lenders decide whether or not to accept a loan application based on the model's expectation of the applicant being capable or not of repaying the financial obligations . Such models are beneficial since they reduce the time needed for the loan approval process, allow loan officers to concentrate on only a percentage of the applications, lead to cost savings, reduce human subjectivity and decrease default risk . There has been a lot of research on this problem, with various Machine Learning (ML) and Artificial Intelligence (AI) techniques proposed. Such techniques might be exceptional in predictive power but are also known as black-box methods since they provide no explanations behind their decisions, making humans unable to interpret them . Therefore, it is highly unlikely that any financial expert is ready to trust the predictions of a model without any sort of justification . With regards to credit scoring, lenders will need to understand the model's predictions to ensure that decisions are made for the correct reasons.
Shapley values underlie one of the most popular model-agnostic methods within explainable artificial intelligence. These values are designed to attribute the difference between a model's prediction and an average baseline to the different features used as input to the model. Being based on solid game-theoretic principles, Shapley values uniquely satisfy several desirable properties, which is why they are increasingly used to explain the predictions of possibly complex and highly non-linear machine learning models. Shapley values are well calibrated to a user's intuition when features are independent, but may lead to undesirable, counterintuitive explanations when the independence assumption is violated. In this paper, we propose a novel framework for computing Shapley values that generalizes recent work that aims to circumvent the independence assumption. By employing Pearl's do-calculus, we show how these 'causal' Shapley values can be derived for general causal graphs without sacrificing any of their desirable properties. Moreover, causal Shapley values enable us to separate the contribution of direct and indirect effects. We provide a practical implementation for computing causal Shapley values based on causal chain graphs when only partial information is available and illustrate their utility on a real-world example.
The growing availability of data and computing power fuels the development of predictive models. In order to ensure the safe and effective functioning of such models, we need methods for exploration, debugging, and validation. New methods and tools for this purpose are being developed within the eXplainable Artificial Intelligence (XAI) subdomain of machine learning. In this work (1) we present the taxonomy of methods for model explanations, (2) we identify and compare 27 packages available in R to perform XAI analysis, (3) we present an example of an application of particular packages, (4) we acknowledge recent trends in XAI. The article is primarily devoted to the tools available in R, but since it is easy to integrate the Python code, we will also show examples for the most popular libraries from Python.
This has led to the development of a plethora of domain-dependent and context-specific methods for dealing with the interpretation of machine learning (ML) models and the formation of explanations for humans. Unfortunately, this trend is far from being over, with an abundance of knowledge in the field which is scattered and needs organisation. The goal of this article is to systematically review research works in the field of XAI and to try to define some boundaries in the field. From several hundreds of research articles focused on the concept of explainability, about 350 have been considered for review by using the following search methodology. In a first phase, Google Scholar was queried to find papers related to "explainable artificial intelligence", "explainable machine learning" and "interpretable machine learning". Subsequently, the bibliographic section of these articles was thoroughly examined to retrieve further relevant scientific studies. The first noticeable thing, as shown in figure 2 (a), is the distribution of the publication dates of selected research articles: sporadic in the 70s and 80s, receiving preliminary attention in the 90s, showing raising interest in 2000 and becoming a recognised body of knowledge after 2010. The first research concerned the development of an explanation-based system and its integration in a computer program designed to help doctors make diagnoses . Some of the more recent papers focus on work devoted to the clustering of methods for explainability, motivating the need for organising the XAI literature [4, 5, 6].
Explanations in Machine Learning come in many forms, but a consensus regarding their desired properties is yet to emerge. In this paper we introduce a taxonomy and a set of descriptors that can be used to characterise and systematically assess explainable systems along five key dimensions: functional, operational, usability, safety and validation. In order to design a comprehensive and representative taxonomy and associated descriptors we surveyed the eXplainable Artificial Intelligence literature, extracting the criteria and desiderata that other authors have proposed or implicitly used in their research. The survey includes papers introducing new explainability algorithms to see what criteria are used to guide their development and how these algorithms are evaluated, as well as papers proposing such criteria from both computer science and social science perspectives. This novel framework allows to systematically compare and contrast explainability approaches, not just to better understand their capabilities but also to identify discrepancies between their theoretical qualities and properties of their implementations. We developed an operationalisation of the framework in the form of Explainability Fact Sheets, which enable researchers and practitioners alike to quickly grasp capabilities and limitations of a particular explainable method. When used as a Work Sheet, our taxonomy can guide the development of new explainability approaches by aiding in their critical evaluation along the five proposed dimensions.