Reviews: Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model
–Neural Information Processing Systems
The highlighted phenomenon (the convergence of a short-run MCMC while training EBMs) seems to be novel and very interesting. The conventional wisdom is that a simple MCMC algorithm like Langevin dynamics would take a long time to converge close to the stationary distribution of the EBM when initialized far from it. The paper argues that in fact if the EBM is trained by generating negative samples from a short-run MCMC, then the short-run MCMC chain would in fact converge close to the data distribution (the authors argue that the "closeness" is related to moment matching). The theoretical argument for explaining this phenomenon seems suggestive, but ultimately didn't convince the reviewer (even convergence of the algorithm seems to be not explained, and section 4.2 seems particularly weak - it's not clear what the "generalized moment matching objective" is trying to achieve). However the empirical evidence for the convergence of short-run MCMC in EBMs seems very compelling - the training procedure for the model is significantly simpler than other procedures used to train EBMs, yet produces highly competitive results on several image datasets.
Neural Information Processing Systems
Jan-22-2025, 12:35:18 GMT
- Technology: