Learning Global Transparent Models consistent with Local Contrastive Explanations
–Neural Information Processing Systems
There is a rich and growing literature on producing local contrastive/counterfactual explanations for black-box models (e.g. In these methods, for an input, an explanation is in the form of a contrast point differing in very few features from the original input and lying in a different class. Other works try to build globally interpretable models like decision trees and rule lists based on the data using actual labels or based on the black-box models predictions. Although these interpretable global models can be useful, they may not be consistent with local explanations from a specific black-box of choice. In this work, we explore the question: Can we produce a transparent global model that is simultaneously accurate and consistent with the local (contrastive) explanations of the black-box model?
Neural Information Processing Systems
Oct-9-2024, 18:57:45 GMT
- Technology: