Generalized Diffusion Model with Adjusted Offset Noise

Kutsuna, Takuro

arXiv.org Machine Learning 

One of the primary objectives of statistical machine learning is to model data distributions, a task that has supported recent advancements in generative artificial intelligence. The goal is to estimate a model that approximates an unknown distribution on the basis of multiple samples drawn from it. For example, when the data consists of images, the estimated model can be used to generate synthetic images that follow the same distribution. Diffusion models [28, 11, 29, 14] have emerged as powerful tools for estimating probability distributions and generating new data samples. They have been shown to outperform other generative models, such as generative adversarial networks (GANs) [6], particularly in image generation tasks [5]. Due to their flexibility and effectiveness, diffusion models are now employed in a wide range of applications, including drug design [3, 8], audio synthesis [17], and text generation [1, 18]. A well-known challenge faced by diffusion models for image generation is their difficulty in producing images with extremely low or high brightness across the entire image [9, 19, 12]. For example, it has been reported that Stable Diffusion [26], a popular diffusion model for text-conditional image generation, struggles to generate fully black or fully white images when given prompts such as "Solid black image" or "A white background" [19].