Reviews: DISCO Nets : DISsimilarity COefficients Networks

Neural Information Processing Systems 

This paper introduces a method for solving a general class of structured prediction problems. The method trains a neural network to construct an output as a deterministic function of the real input and a sample from some noise source. Entropy in the noise source becomes entropy in the output distribution. Mismatch between the model distribution and true predictive distribution is measured using a strictly proper scoring rule, a la Gneiting and Raftery (JASA 2007). One thing that concerns me about the proposed approach is whether the "expected score" that's used for measuring dissimilarity between the model predictions and the true predictive distribution provides a strong learning signal. Especially in the minibatch setting, I'd be worried about variance in the gradient wiping out information about subtle mismatch between the model and true distributions.