Calls to arms to build interpretable models express a well-founded discomfort with machine learning. Should a software agent that does not even know what a loan is decide who qualifies for one? Indeed, we ought to be cautious about injecting machine learning (or anything else, for that matter) into applications where there may be a significant risk of causing social harm. However, claims that stakeholders "just won't accept that!" do not provide a sufficient foundation for a proposed field of study. For the field of interpretable machine learning to advance, we must ask the following questions: What precisely won't various stakeholders accept? What do they want? Are these desiderata reasonable? Are they feasible? In order to answer these questions, we'll have to give real-world problems and their respective stakeholders greater consideration.
Any sufficiently advanced technology is indistinguishable from magic. In the world of artificial intelligence & machine learning (AI & ML), black- and white-box categorization of models and algorithms refers to their interpretability. That is, given a model trained to map data inputs to outputs (e.g. And just as the software testing dichotomy is high-level behavior vs low-level logic, only white-box AI methods can be readily interpreted to see the logic behind models' predictions. In recent years with machine learning taking over new industries and applications, where the number of users far outnumber experts that grok the models and algorithms, the conversation around interpretability has become an important one.
There has recently been a surge of work in explanatory artificial intelligence (XAI). This research area tackles the important problem that complex machines and algorithms often cannot provide insights into their behavior and thought processes. XAI allows users and parts of the internal system to be more transparent, providing explanations of their decisions in some level of detail. These explanations are important to ensure algorithmic fairness, identify potential bias/problems in the training data, and to ensure that the algorithms perform as expected. However, explanations produced by these systems is neither standardized nor systematically assessed. In an effort to create best practices and identify open challenges, we provide our definition of explainability and show how it can be used to classify existing literature. We discuss why current approaches to explanatory methods especially for deep neural networks are insufficient. Finally, based on our survey, we conclude with suggested future research directions for explanatory artificial intelligence.
Recent work on interpretability in machine learning and AI has focused on the building of simplified models that approximate the true criteria used to make decisions. These models are a useful pedagogical device for teaching trained professionals how to predict what decisions will be made by the complex system, and most importantly how the system might break. However, when considering any such model it's important to remember Box's maxim that "All models are wrong but some are useful." We focus on the distinction between these models and explanations in philosophy and sociology. These models can be understood as a "do it yourself kit" for explanations, allowing a practitioner to directly answer "what if questions" or generate contrastive explanations without external assistance. Although a valuable ability, giving these models as explanations appears more difficult than necessary, and other forms of explanation may not have the same trade-offs. We contrast the different schools of thought on what makes an explanation, and suggest that machine learning might benefit from viewing the problem more broadly.
The ubiquity of machine learning based predictive models in modern society naturally leads people to ask how trustworthy those models are? In predictive modeling, it is quite common to induce a trade-off between accuracy and interpretability. For instance, doctors would like to know how effective some treatment will be for a patient or why the model suggested a particular medication for a patient exhibiting those symptoms? We acknowledge that the necessity for interpretability is a consequence of an incomplete formalisation of the problem, or more precisely of multiple meanings adhered to a particular concept. For certain problems, it is not enough to get the answer (what), the model also has to provide an explanation of how it came to that conclusion (why), because a correct prediction, only partially solves the original problem. In this article we extend existing categorisation of techniques to aid model interpretability and test this categorisation.