Interpretable deep learning is a fundamental building block towards safer AI, especially when the deployment possibilities of deep learning-based computer-aided medical diagnostic systems are so eminent. However, without a computational formulation of black-box interpretation, general interpretability research rely heavily on subjective bias. Clear decision structure of the medical diagnostics lets us approximate the decision process of a radiologist as a model - removed from subjective bias. We define the process of interpretation as a finite communication between a known model and a black-box model to optimally map the black box's decision process in the known model. Consequently, we define interpretability as maximal information gain over the initial uncertainty about the black-box's decision within finite communication. We relax this definition based on the observation that diagnostic interpretation is typically achieved by a process of minimal querying. We derive an algorithm to calculate diagnostic interpretability. The usual question of accuracy-interpretability tradeoff, i.e. whether a black-box model's prediction accuracy is dependent on its ability to be interpreted by a known source model, does not arise in this theory. With multiple example simulation experiments of various complexity levels, we demonstrate the working of such a theoretical model in synthetic supervised classification scenarios.
Any sufficiently advanced technology is indistinguishable from magic. In the world of artificial intelligence & machine learning (AI & ML), black- and white-box categorization of models and algorithms refers to their interpretability. That is, given a model trained to map data inputs to outputs (e.g. And just as the software testing dichotomy is high-level behavior vs low-level logic, only white-box AI methods can be readily interpreted to see the logic behind models' predictions. In recent years with machine learning taking over new industries and applications, where the number of users far outnumber experts that grok the models and algorithms, the conversation around interpretability has become an important one.
Update: I have since refined these ideas in The Mythos of Model Interpretability, an academic paper presented at the 2016 ICML Workshop on Human Interpretability of Machine Learning. Neural networks, on the other hand, are black boxes. By this, it's suggested that we can pass input in, and observe what comes out, but we lack the ability to reason about what happened in the middle. To confirm the prevalence of this narrative, I ran a Google search for "neural network black box", yielding 2,410,000 results. By comparison, "logistic regression black box" turns up 600,000 results.
Machine-learning models have demonstrated great success in learning complex patterns that enable them to make predictions about unobserved data. In addition to using models for prediction, the ability to interpret what a model has learned is receiving an increasing amount of attention. However, this increased focus has led to considerable confusion about the notion of interpretability. In particular, it is unclear how the wide array of proposed interpretation methods are related, and what common concepts can be used to evaluate them. We aim to address these concerns by defining interpretability in the context of machine learning and introducing the Predictive, Descriptive, Relevant (PDR) framework for discussing interpretations. The PDR framework provides three overarching desiderata for evaluation: predictive accuracy, descriptive accuracy and relevancy, with relevancy judged relative to a human audience. Moreover, to help manage the deluge of interpretation methods, we introduce a categorization of existing techniques into model-based and post-hoc categories, with sub-groups including sparsity, modularity and simulatability. To demonstrate how practitioners can use the PDR framework to evaluate and understand interpretations, we provide numerous real-world examples. These examples highlight the often under-appreciated role played by human audiences in discussions of interpretability. Finally, based on our framework, we discuss limitations of existing methods and directions for future work. We hope that this work will provide a common vocabulary that will make it easier for both practitioners and researchers to discuss and choose from the full range of interpretation methods.
Artificial Intelligence models are becoming increasingly more powerful and accurate, supporting or even replacing humans' decision making. But with increased power and accuracy also comes higher complexity, making it hard for users to understand how the model works and what the reasons behind its predictions are. Humans must explain and justify their decisions, and so do the AI models supporting them in this process, making semantic interpretability an emerging field of study. In this work, we look at interpretability from a broader point of view, going beyond the machine learning scope and covering different AI fields such as distributional semantics and fuzzy logic, among others. We examine and classify the models according to their nature and also based on how they introduce interpretability features, analyzing how each approach affects the final users and pointing to gaps that still need to be addressed to provide more human-centered interpretability solutions.