Review for NeurIPS paper: Generative causal explanations of black-box classifiers

Neural Information Processing Systems 

This paper presents a generative model to "explain" any given black-box classifier and its training dataset. Explanation is through a hidden factor that can control or intervene in the output of the classifier. The discovery is based on a objective with two terms: 1) a proposed Information Flow that denotes the causal effect from the hidden factor to the classifier output and 2) a distribution similarity to impose the discovered hidden factor can generate back the feature space. Reviewers found this a borderline paper. After the discussion phase all reviewers are leaning towards acceptance. They pointed out as strengths that this is a very well-written paper, presenting a simple yet effective method, with extensive ablative experiments.