guidance weight
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- North America > United States (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.46)
Learn to Guide Your Diffusion Model
Galashov, Alexandre, Pokle, Ashwini, Doucet, Arnaud, Gretton, Arthur, Delbracio, Mauricio, De Bortoli, Valentin
Classifier-free guidance (CFG) is a widely used technique for improving the perceptual quality of samples from conditional diffusion models. It operates by linearly combining conditional and unconditional score estimates using a guidance weight $ω$. While a large, static weight can markedly improve visual results, this often comes at the cost of poorer distributional alignment. In order to better approximate the target conditional distribution, we instead learn guidance weights $ω_{c,(s,t)}$, which are continuous functions of the conditioning $c$, the time $t$ from which we denoise, and the time $s$ towards which we denoise. We achieve this by minimizing the distributional mismatch between noised samples from the true conditional distribution and samples from the guided diffusion process. We extend our framework to reward guided sampling, enabling the model to target distributions tilted by a reward function $R(x_0,c)$, defined on clean data and a conditioning $c$. We demonstrate the effectiveness of our methodology on low-dimensional toy examples and high-dimensional image settings, where we observe improvements in Fréchet inception distance (FID) for image generation. In text-to-image applications, we observe that employing a reward function given by the CLIP score leads to guidance weights that improve image-prompt alignment.
- North America > United States > New Mexico > Bernalillo County > Albuquerque (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Stage-wise Dynamics of Classifier-Free Guidance in Diffusion Models
Jin, Cheng, Shi, Qitan, Gu, Yuantao
Classifier-Free Guidance (CFG) is widely used to improve conditional fidelity in diffusion models, but its impact on sampling dynamics remains poorly understood. Prior studies, often restricted to unimodal conditional distributions or simplified cases, provide only a partial picture. We analyze CFG under multimodal conditionals and show that the sampling process unfolds in three successive stages. In the Direction Shift stage, guidance accelerates movement toward the weighted mean, introducing initialization bias and norm growth. In the Mode Separation stage, local dynamics remain largely neutral, but the inherited bias suppresses weaker modes, reducing global diversity. In the Concentration stage, guidance amplifies within-mode contraction, diminishing fine-grained variability. This unified view explains a widely observed phenomenon: stronger guidance improves semantic alignment but inevitably reduces diversity. Experiments support these predictions, showing that early strong guidance erodes global diversity, while late strong guidance suppresses fine-grained variation. Moreover, our theory naturally suggests a time-varying guidance schedule, and empirical results confirm that it consistently improves both quality and diversity.
- North America > United States > New York (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Maryland (0.04)
- (3 more...)
- Leisure & Entertainment > Sports (0.46)
- Law (0.46)
Angle Domain Guidance: Latent Diffusion Requires Rotation Rather Than Extrapolation
Jin, Cheng, Xiao, Zhenyu, Liu, Chutao, Gu, Yuantao
Classifier-free guidance (CFG) has emerged as a pivotal advancement in text-to-image latent diffusion models, establishing itself as a cornerstone technique for achieving high-quality image synthesis. However, under high guidance weights, where text-image alignment is significantly enhanced, CFG also leads to pronounced color distortions in the generated images. We identify that these distortions stem from the amplification of sample norms in the latent space. We present a theoretical framework that elucidates the mechanisms of norm amplification and anomalous diffusion phenomena induced by classifier-free guidance. Leveraging our theoretical insights and the latent space structure, we propose an Angle Domain Guidance (ADG) algorithm. ADG constrains magnitude variations while optimizing angular alignment, thereby mitigating color distortions while preserving the enhanced text-image alignment achieved at higher guidance weights. Experimental results demonstrate that ADG significantly outperforms existing methods, generating images that not only maintain superior text alignment but also exhibit improved color fidelity and better alignment with human perceptual preferences.
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada (0.04)
- (2 more...)