Decouple-Then-Merge: Towards Better Training for Diffusion Models

Ma, Qianli, Ning, Xuefei, Liu, Dongrui, Niu, Li, Zhang, Linfeng

Oct-9-2024–arXiv.org Artificial Intelligence

Diffusion models are trained by learning a sequence of models that reverse each step of noise corruption. Typically, the model parameters are fully shared across multiple timesteps to enhance training efficiency. However, since the denoising tasks differ at each timestep, the gradients computed at different timesteps may conflict, potentially degrading the overall performance of image generation. To solve this issue, this work proposes a Decouple-then-Merge (DeMe) framework, which begins with a pretrained model and finetunes separate models tailored to specific timesteps. We introduce several improved techniques during the finetuning stage to promote effective knowledge sharing while minimizing training interference across timesteps. Finally, after finetuning, these separate models can be merged into a single model in the parameter space, ensuring efficient and practical inference. Experimental results show significant generation quality improvements upon 6 benchmarks including Stable Diffusion on COCO30K, ImageNet1K, PartiPrompts, and DDPM on LSUN Church, LSUN Bedroom, and CIFAR10. Generative modeling has seen significant progress in recent years, primarily driven by the development of Diffusion Probabilistic Models (DPMs) (Ho et al., 2020; Nichol & Dhariwal, 2021; Rombach et al., 2022b). These models have been applied to various tasks such as text-to-image generation (Rombach et al., 2022a), image-to-image translation (Saharia et al., 2022a), image editing (Yang et al., 2023a), and video generation (Ho et al., 2022; Blattmann et al., 2023), yielding excellent performance. Compared with other generative models such as variational auto-encoders (VAEs) (Kingma & Welling, 2013), and generative adversarial networks (GANs) (Goodfellow et al., 2014), the most distinct characteristic of DPMs is that DPMs need to learn a sequence of models for denoising at multiple timesteps.

artificial intelligence, diffusion model, machine learning, (16 more...)

arXiv.org Artificial Intelligence

Oct-9-2024

arXiv.org PDF

Add feedback

Country:
- Europe > Switzerland > Zürich > Zürich (0.14)

Genre:
- Research Report > New Finding (0.34)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)