Convergence of score-based generative modeling for general data distributions

Lee, Holden, Lu, Jianfeng, Tan, Yixin

arXiv.org Artificial Intelligence 

Diffusion models have gained huge popularity in recent years in machine learning, as a method to learn and generate new samples from a data distribution. Score-based generative modeling (SGM), as a particular kind of diffusion model, uses learned score functions (gradients of the log-pdf) to transform white noise to the data distribution through following a stochatic differential equation. While SGM has achieved state-of-theart performance for artificial image and audio generation [SE19; Dat+19; Gra+19; SE20; Son+20; Men+21; Son+21b; Son+21a; Jin+22], including being a key component of text-to-image systems [Ram+22], our theoretical understanding of these models is still nascent. In particular, basic questions on the convergence of the generated distribution to the data distribution remain unanswered. Recent theoretical work on SGM has attempted to answer these questions [De +21; LLT22; De 22], but they either suffer from exponential dependence on parameters or rely on strong assumptions on the data distribution such as functional inequalities or smoothness, which are rarely satisfied in practical situations. For example, considering the hallmark application of generating images from text, we expect the distribution of images to be (a) multimodal, and hence not satisfying functional inequalities with reasonable constants, and (b) supported on lower-dimensional manifolds, and hence not smooth. However, SGM still performs remarkably well in these settings. Indeed, this is one relative advantage to other approaches to generative modeling such as generative adversarial networks, which can struggle to learn multimodal distributions [ARZ18]. In this work, we aim to develop theoretical convergence guarantees with polynomial complexity for SGM under minimal data assumptions.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found