Review for NeurIPS paper: Towards Interpretable Natural Language Understanding with Explanations as Latent Variables

Neural Information Processing Systems 

Weaknesses: My main concern is about how explanations are being employed as latent variables. I had assumed based on the introduction that the final predictor would factor through the final explanation. This would provide the faithfulness guarantee that two inputs which produce the same explanation would produce the same output label. However, it seems that during training, the explanation is conditioned on the gold label. The paper points out on L161 that "generating explanations without a predicted label often results in irrelevant and even misleading explanations."