Goto

Collaborating Authors

 correction


Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections

Neural Information Processing Systems

We present a new algorithm to generate minimal, stable, and symbolic corrections to an input that will cause a neural network with ReLU activations to change its output. We argue that such a correction is a useful way to provide feedback to a user when the network's output is different from a desired output. Our algorithm generates such a correction by solving a series of linear constraint satisfaction problems. The technique is evaluated on three neural network models: one predicting whether an applicant will pay a mortgage, one predicting whether a first-order theorem can be proved efficiently by a solver using certain heuristics, and the final one judging whether a drawing is an accurate rendition of a canonical drawing of a cat.





A Constrained sampling via post-processed denoiser In this section, we provide more details on the apparatus necessary to perform a posteriori conditional

Neural Information Processing Systems

Eq. (6) suggests that the SDE drift corresponding to the score may be broken down into 3 steps: 1. However, in practice this modification creates a "discontinuity" between the constrained and unconstrained components, leading to erroneous correlations between them in the generated samples. "learning rate" that is determined empirically such that the loss value reduces adequately close to zero Thus it needs to be tuned empirically. The correction in Eq. (16) is equivalent to imposing a Gaussian likelihood on Remark 2. The post-processing presented in this section is similar to [ In this section, we present the most relevant components for completeness and better reproducibility. B.2 Sampling The reverse SDE in Eq. (5) used for sampling may be rewritten in terms of denoiser D As stated in 4.1 of the main text, for this The energy-based metrics are already defined in Eq. (12) and Eq.