Collaborating Authors

Building explainability into the components of machine-learning models


Explanation methods that help users understand and trust machine-learning models often describe how much certain features used in the model contribute to its prediction. For example, if a model predicts a patient's risk of developing cardiac disease, a physician might want to know how strongly the patient's heart rate data influences that prediction. But if those features are so complex or convoluted that the user can't understand them, does the explanation method do any good? MIT researchers are striving to improve the interpretability of features so decision makers will be more comfortable using the outputs of machine-learning models. Drawing on years of field work, they developed a taxonomy to help developers craft features that will be easier for their target audience to understand.

Picking an explainability technique


ML Model Explainability (sometimes referred to as Model Interpretability or ML Model Transparency) is a fundamental pillar of AI Quality. It is impossible to trust a machine learning model without understanding how and why it makes its decisions, and whether these decisions are justified. Peering into ML models is absolutely necessary before deploying them in the wild, where a poorly understood model can not only fail to achieve its objective, but also cause negative business or social impacts, or encounter regulatory trouble. Explainability is also an important backbone to other trustworthy ML pillars like fairness and stability. Yet "explainability" is often a broad and sometimes confusing concept.

Interpretable Machine Learning

Communications of the ACM

Later in this article we include an extensive discussion about best practices for this IML workflow to flesh out the taxonomy and deliver rigorously tested diagnostics to consumers. Ultimately, there could be an increasingly complete taxonomy that allows consumers (C) to find suitable IML methods for their use cases and helps researchers (R) to ground their technical work in real applications (as seen on the right side of Figure 2). For instance, the accompanying table highlights concrete examples of how three different potential diagnostics, each corresponding to different types of IML methods (local feature attribution, local counterfactual, and global counterfactual, respectively), may provide useful insights for three use cases. In particular, the computer vision use case from the table is expanded upon as a running example. An increasingly diverse set of methods has been recently proposed and broadly classified as part of IML. Multiple concerns have been expressed, however, in light of this rapid development, focused on IML's underlying foundations and the gap between research and practice.

One Explanation Does Not Fit All: A Toolkit and Taxonomy of AI Explainability Techniques Artificial Intelligence

As artificial intelligence and machine learning algorithms make further inroads into society, calls are increasing from multiple stakeholders for these algorithms to explain their outputs. At the same time, these stakeholders, whether they be affected citizens, government regulators, domain experts, or system developers, present different requirements for explanations. Toward addressing these needs, we introduce AI Explainability 360 (, an open-source software toolkit featuring eight diverse and state-of-the-art explainability methods and two evaluation metrics. Equally important, we provide a taxonomy to help entities requiring explanations to navigate the space of explanation methods, not only those in the toolkit but also in the broader literature on explainability. For data scientists and other users of the toolkit, we have implemented an extensible software architecture that organizes methods according to their place in the AI modeling pipeline. We also discuss enhancements to bring research innovations closer to consumers of explanations, ranging from simplified, more accessible versions of algorithms, to tutorials and an interactive web demo to introduce AI explainability to different audiences and application domains. Together, our toolkit and taxonomy can help identify gaps where more explainability methods are needed and provide a platform to incorporate them as they are developed.