Generative causal explanations of black-box classifiers

Oct-9-2024, 23:56:59 GMT–Neural Information Processing Systems

We develop a method for generating causal post-hoc explanations of black-box classifiers based on a learned low-dimensional representation of the data. The explanation is causal in the sense that changing learned latent factors produces a change in the classifier output statistics. To construct these explanations, we design a learning framework that leverages a generative model and information-theoretic measures of causal influence. Our objective function encourages both the generative model to faithfully represent the data distribution and the latent factors to have a large causal influence on the classifier output. Our method learns both global and local explanations, is compatible with any classifier that admits class probabilities and a gradient, and does not require labeled attributes or knowledge of causal structure.

black-box classifier, explanation, generative causal explanation, (2 more...)

Neural Information Processing Systems

Oct-9-2024, 23:56:59 GMT

Conferences Web Page

Add feedback

Industry:
- Transportation > Air (0.66)

Technology:
- Information Technology > Artificial Intelligence (0.90)