Information-Theoretic Diffusion
Kong, Xianghao, Brekelmans, Rob, Steeg, Greg Ver
–arXiv.org Artificial Intelligence
Denoising diffusion models have spurred significant gains in density modeling and image generation, precipitating an industrial revolution in text-guided AI art generation. We introduce a new mathematical foundation for diffusion models inspired by classic results in information theory that connect Information with Minimum Mean Square Error regression, the so-called I-MMSE relations. We generalize the I-MMSE relations to exactly relate the data distribution to an optimal denoising regression problem, leading to an elegant refinement of existing diffusion bounds. This new insight leads to several improvements for probability distribution estimation, including theoretical justification for diffusion model ensembling. Remarkably, our framework shows how continuous and discrete probabilities can be learned with the same regression objective, avoiding domain-specific generative models used in variational methods. Denoising diffusion models (Sohl-Dickstein et al., 2015) incorporating recent improvements (Ho et al., 2020) now outperform GANs for image generation (Dhariwal & Nichol, 2021), and also lead to better density models than previously state-of-the-art autoregressive models (Kingma et al., 2021). The quality and flexibility of image results have led to major new industrial applications for automatically generating diverse and realistic images from open-ended text prompts (Ramesh et al., 2022; Saharia et al., 2022; Rombach et al., 2022). Mathematically, diffusion models can be understood in a variety of ways: as classic denoising autoencoders (Vincent, 2011) with multiple noise levels and a new architecture (Ho et al., 2020), as VAEs with a fixed noising encoder (Kingma et al., 2021), as annealed score matching models (Song & Ermon, 2019), as a non-equilibrium process that tractably bridges between a target distribution and a Gaussian (Sohl-Dickstein et al., 2015), or as a stochastic differential equation that does the same (Song et al., 2020; Liu et al., 2022). In this paper, we call attention to a connection between diffusion models and a classic result in information theory relating the mutual information to the Minimum Mean Square Error (MMSE) estimator for denoising a Gaussian noise channel (Guo et al., 2005). Research on Information and MMSE (often referred to as I-MMSE relations) transformed information theory with new representations of standard measures leading to elegant proofs of fundamental results (Verdú & Guo, 2006). This paper uses a generalization of the I-MMSE relation to discover an exact relation between data probability distribution and optimal denoising regression. The information-theoretic formulation of diffusion simplifies and improves existing results.
arXiv.org Artificial Intelligence
Feb-7-2023