Reviews: Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Neural Information Processing Systems 

This work proposes a novel method that can potentially provide actionable insight to the user when a neural network makes a less than favorable decision. The paper is interesting in that it provides stable and hence potentially actionable insight that can help the target user change an undesired outcome in the future. The work focuses on asymmetric insight in the sense that insight or suggestions are provided only when the classification is for a certain class. So it is mainly applicable to specific kind of binary classification problems where being classified into one class is more undesirable and requires justification. Some hand wavy arguments are provided in the supplement for extension to multiple classes (one vs all), however it would be good to see experiments on those in practice as it is not at all obvious how the solution extends when you have more than one undesirable class.