Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks

Domingo-Enrich, Carles, Bietti, Alberto, Gabrié, Marylou, Bruna, Joan, Vanden-Eijnden, Eric

arXiv.org Machine Learning 

Energy-based models (EBMs) are explicit generative models which work by considering Gibbs measures defined through an energy function f, with a probability density proportional to exp( βf(x)), where β is the inverse temperature. Such models originate in statistical physics [Gibbs, 2010, Ruelle, 1969], and have become a fundamental modeling tool in statistics and machine learning [Wainwright and Jordan, 2008, Ranzato et al., 2007, LeCun et al., 2006, Du and Mordatch, 2019, Song and Kingma, 2021]. Given data samples from a target distribution, the learning algorithms for EBMs attempt to estimate an energy function f to model the samples density. The resulting learned model can then be used to obtain new samples, typically through Markov Chain Monte Carlo (MCMC) techniques. The standard method to train EBMs is maximum likelihood estimation, i.e. the learned energy is the one maximizing the likelihood of the target samples, within a certain function class. One generic approach for this is to use gradient descent, where gradients may be approximated using MCMC samples from the trained model. However, this is computationally difficult for highly non-convex trained energies, which in recent years has motivated a myriad of alternative losses to learn EBM energies, such as the popular score matching; see [Song and Kingma, 2021] for a review. EBMs also have structural connections with maximum entropy (maxent) models, which have been studied for decades through Fenchel duality. Dai et al. [2019b] was the first work to leverage similar duality arguments