Reviews: Neural Variational Inference and Learning in Undirected Graphical Models

Neural Information Processing Systems 

In this paper the authors essentially propose to train a MLP to generate proposal samples which are used to estimate the partition function Z of an undirected model. Instead of using straight importance sampling to estimate Z (which would be an unbiased estimator for Z), they propose a bound that overestimates Z 2 *in expectation*. While the authors highlight around line 70 that this only works when q is sufficiently close to p, I think it should be made even clearer that almost any estimate with a finite number of samples will *underestimate* Z 2 when q is not sufficiently close. I agree with the authors that this is probably not an issue at the beginning of training -- but I imagine it becomes an issue as p becomes multimodal/peaky towards convergence, when q cannot follow that distribution anymore. Which begs the question: Why would we train an undirected model p, when the training and evaluation method breaks down around the point when the jointly trained and properly normalized proposal distribution q cannot follow it anymore?