Framework for Evaluating Faithfulness of Local Explanations

Dasgupta, Sanjoy, Frost, Nave, Moshkovitz, Michal

arXiv.org Machine Learning 

Machine learning is an integral part of many human-facing computer systems and is increasingly a key component of decisions that have profound effects on people's lives. There are many dangers that come with this. For instance, statistical models can easily be error-prone in regions of the input space that are not well-reflected in training data but that end up arising in practice. Or they can be excessively complicated in ways that impact their generalization ability. Or they might implicitly make their decisions based on criteria that would not considered acceptable by society. For all these reasons, and many others, it is crucial to have models that are understandable or can explain their predictions to humans [19]. Explanations of a classification system can take many forms, but should accurately reflect the classifier's inner workings.