Reviews: e-SNLI: Natural Language Inference with Natural Language Explanations

Neural Information Processing Systems 

I think the idea of explicable models is worth pursuing, and this is a decent contribution to showing how one might do that. It is unfortunate that this work shows a huge tradeoff between models that perform at high levels and those that explain well (from 4.1 it seems like we can get good performance, but then can't generate correct explanations very often and from 4.2 we can generate correct explanations more often at the expense of good performance). It also seems disappointing that the BLEU scores in the PREDICT setting are already so close to the inter-annotator agreement even though they are not correct explanations very often; this seems to suggest that we really do need to rely on the percent correct given by human evaluation and that the BLEU scores are not very meaningful. This seems like a bottleneck for this resource being widely adopted. Nonetheless, these findings are a solid contribution and so is the data if others are willing to do human evaluation or work on a new automatic metric for a task like this.