Explanation & Argumentation
Helpful, Misleading or Confusing: How Humans Perceive Fundamental Building Blocks of Artificial Intelligence Explanations
Small, Edward, Xuan, Yueqing, Hettiachchi, Danula, Sokol, Kacper
Explainable artificial intelligence techniques are developed at breakneck speed, but suitable evaluation approaches lag behind. With explainers becoming increasingly complex and a lack of consensus on how to assess their utility, it is challenging to judge the benefit and effectiveness of different explanations. To address this gap, we take a step back from sophisticated predictive algorithms and instead look into explainability of simple decision-making models. In this setting, we aim to assess how people perceive comprehensibility of their different representations such as mathematical formulation, graphical representation and textual summarisation (of varying complexity and scope). This allows us to capture how diverse stakeholders -- engineers, researchers, consumers, regulators and the like -- judge intelligibility of fundamental concepts that more elaborate artificial intelligence explanations are built from. This position paper charts our approach to establishing appropriate evaluation methodology as well as a conceptual and practical framework to facilitate setting up and executing relevant user studies.
The XAISuite framework and the implications of explanatory system dissonance
Mitra, Shreyan, Gilpin, Leilani
Explanatory systems make machine learning models more transparent. However, they are often inconsistent. In order to quantify and isolate possible scenarios leading to this discrepancy, this paper compares two explanatory systems, SHAP and LIME, based on the correlation of their respective importance scores using 14 machine learning models (7 regression and 7 classification) and 4 tabular datasets (2 regression and 2 classification). We make two novel findings. Firstly, the magnitude of importance is not significant in explanation consistency. The correlations between SHAP and LIME importance scores for the most important features may or may not be more variable than the correlation between SHAP and LIME importance scores averaged across all features. Secondly, the similarity between SHAP and LIME importance scores cannot predict model accuracy. In the process of our research, we construct an open-source library, XAISuite, that unifies the process of training and explaining models. Finally, this paper contributes a generalized framework to better explain machine learning models and optimize their performance.
Explainable AI Classifies Colorectal Cancer with Personalized Gut Microbiome Analysis
The gut microbiome comprises a complex population of different bacterial species that are essential to human health. In recent years, scientists across several fields have found that changes in the gut microbiome can be linked to a wide variety of diseases, notably colorectal cancer (CRC). Multiple studies have revealed that a higher abundance of certain bacteria, such as Fusobacterium nucleatum and Parvimonas micra, is typically associated with CRC progression. Based on these findings, researchers have developed various artificial intelligence (AI) models to help them analyze which bacterial species are useful as CRC biomarkers. However, most of these models rely on what is known as "global explanations," meaning that they can only consider the entirety of the input data to make predictions.
Using large language models for (de-)formalization and natural argumentation exercises for beginner's students
We describe two systems that use text-davinci-003, a large language model, for the automatized correction of (i) exercises in translating back and forth between natural language and the languages of propositional logic and first-order predicate logic and (ii) exercises in writing simple arguments in natural language in non-mathematical scenarios.
A Review on Explainable Artificial Intelligence for Healthcare: Why, How, and When?
Bharati, Subrato, Mondal, M. Rubaiyat Hossain, Podder, Prajoy
Artificial intelligence (AI) models are increasingly finding applications in the field of medicine. Concerns have been raised about the explainability of the decisions that are made by these AI models. In this article, we give a systematic analysis of explainable artificial intelligence (XAI), with a primary focus on models that are currently being used in the field of healthcare. The literature search is conducted following the preferred reporting items for systematic reviews and meta-analyses (PRISMA) standards for relevant work published from 1 January 2012 to 02 February 2022. The review analyzes the prevailing trends in XAI and lays out the major directions in which research is headed. We investigate the why, how, and when of the uses of these XAI models and their implications. We present a comprehensive examination of XAI methodologies as well as an explanation of how a trustworthy AI can be derived from describing AI models for healthcare fields. The discussion of this work will contribute to the formalization of the XAI field.
The Rise of Explainable AI and its Implications for Transparency and Accountability
Artificial intelligence (AI) has rapidly become an essential part of modern society, powering everything from virtual assistants to medical diagnosis tools. As AI systems have become more advanced, the lack of transparency and accountability has become an increasingly pressing concern. This is where explainable AI (XAI) comes in. XAI is a new approach to developing AI systems that prioritise transparency and accountability by making their decision-making processes understandable to humans. This article explores the rise of explainable AI and its implications.
Explainable AI And Visual Reasoning: Insights From Radiology
Why do explainable AI (XAI) explanations in radiology, despite their promise of transparency, still fail to gain human trust? Current XAI approaches provide justification for predictions, however, these do not meet practitioners' needs. These XAI explanations lack intuitive coverage of the evidentiary basis for a given classification, posing a significant barrier to adoption. We posit that XAI explanations that mirror human processes of reasoning and justification with evidence may be more useful and trustworthy than traditional visual explanations like heat maps. Using a radiology case study, we demonstrate how radiology practitioners get other practitioners to see a diagnostic conclusion's validity. Machine-learned classifications lack this evidentiary grounding and consequently fail to elicit trust and adoption by potential users. Insights from this study may generalize to guiding principles for human-centered explanation design based on human reasoning and justification of evidence.
Towards Explainable AI Writing Assistants for Non-native English Speakers
Kim, Yewon, Lee, Mina, Kim, Donghwi, Lee, Sung-Ju
We highlight the challenges faced by non-native speakers when using AI writing assistants to paraphrase text. Through an interview study with 15 non-native English speakers (NNESs) with varying levels of English proficiency, we observe that they face difficulties in assessing paraphrased texts generated by AI writing assistants, largely due to the lack of explanations accompanying the suggested paraphrases. Furthermore, we examine their strategies to assess AI-generated texts in the absence of such explanations. Drawing on the needs of NNESs identified in our interview, we propose four potential user interfaces to enhance the writing experience of NNESs using AI writing assistants. The proposed designs focus on incorporating explanations to better support NNESs in understanding and evaluating the AI-generated paraphrasing suggestions.
Benchmarking Faithfulness: Towards Accurate Natural Language Explanations in Vision-Language Tasks
With deep neural models increasingly permeating our daily lives comes a need for transparent and comprehensible explanations of their decision-making. However, most explanation methods that have been developed so far are not intuitively understandable for lay users. In contrast, natural language explanations (NLEs) promise to enable the communication of a model's decision-making in an easily intelligible way. While current models successfully generate convincing explanations, it is an open question how well the NLEs actually represent the reasoning process of the models - a property called faithfulness. Although the development of metrics to measure faithfulness is crucial to designing more faithful models, current metrics are either not applicable to NLEs or are not designed to compare different model architectures across multiple modalities. Building on prior research on faithfulness measures and based on a detailed rationale, we address this issue by proposing three faithfulness metrics: Attribution-Similarity, NLE-Sufficiency, and NLE-Comprehensiveness. The efficacy of the metrics is evaluated on the VQA-X and e-SNLI-VE datasets of the e-ViL benchmark for vision-language NLE generation by systematically applying modifications to the performant e-UG model for which we expect changes in the measured explanation faithfulness. We show on the e-SNLI-VE dataset that the removal of redundant inputs to the explanation-generation module of e-UG successively increases the model's faithfulness on the linguistic modality as measured by Attribution-Similarity. Further, our analysis demonstrates that NLE-Sufficiency and -Comprehensiveness are not necessarily correlated to Attribution-Similarity, and we discuss how the two metrics can be utilized to gain further insights into the explanation generation process.
Is More Always Better? The Effects of Personal Characteristics and Level of Detail on the Perception of Explanations in a Recommender System
Chatti, Mohamed Amine, Guesmi, Mouadh, Vorgerd, Laura, Ngo, Thao, Joarder, Shoeb, Ain, Qurat Ul, Muslim, Arham
Despite the acknowledgment that the perception of explanations may vary considerably between end-users, explainable recommender systems (RS) have traditionally followed a one-size-fits-all model, whereby the same explanation level of detail is provided to each user, without taking into consideration individual user's context, i.e., goals and personal characteristics. To fill this research gap, we aim in this paper at a shift from a one-size-fits-all to a personalized approach to explainable recommendation by giving users agency in deciding which explanation they would like to see. We developed a transparent Recommendation and Interest Modeling Application (RIMA) that provides on-demand personalized explanations of the recommendations, with three levels of detail (basic, intermediate, advanced) to meet the demands of different types of end-users. We conducted a within-subject study (N=31) to investigate the relationship between user's personal characteristics and the explanation level of detail, and the effects of these two variables on the perception of the explainable RS with regard to different explanation goals. Our results show that the perception of explainable RS with different levels of detail is affected to different degrees by the explanation goal and user type. Consequently, we suggested some theoretical and design guidelines to support the systematic design of explanatory interfaces in RS tailored to the user's context.