Generative causal explanations of black-box classifiers

Neural Information Processing Systems 

We develop a method for generating causal post-hoc explanations of black-box classifiers based on a learned low-dimensional representation of the data. The explanation is causal in the sense that changing learned latent factors produces a change in the classifier output statistics. To construct these explanations, we design a learning framework that leverages a generative model and information-theoretic measures of causal influence.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found