Goto

Collaborating Authors

 pf-ode


Diffusing Differentiable Representations

Neural Information Processing Systems

We introduce a novel, training-free method for sampling differentiable representations (diffreps) using pretrained diffusion models. Rather than merely mode-seeking, our method achieves sampling by "pulling back" the dynamics of the



Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Neural Information Processing Systems

Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work.




Align Your Flow: Scaling Continuous-Time Flow Map Distillation

Sabour, Amirmojtaba, Fidler, Sanja, Kreis, Karsten

arXiv.org Artificial Intelligence

Diffusion- and flow-based models have emerged as state-of-the-art generative modeling approaches, but they require many sampling steps. Consistency models can distill these models into efficient one-step generators; however, unlike flow- and diffusion-based methods, their performance inevitably degrades when increasing the number of steps, which we show both analytically and empirically. Flow maps generalize these approaches by connecting any two noise levels in a single step and remain effective across all step counts. In this paper, we introduce two new continuous-time objectives for training flow maps, along with additional novel training techniques, generalizing existing consistency and flow matching objectives. We further demonstrate that autoguidance can improve performance, using a low-quality model for guidance during distillation, and an additional boost can be achieved by adversarial finetuning, with minimal loss in sample diversity. We extensively validate our flow map models, called Align Your Flow, on challenging image generation benchmarks and achieve state-of-the-art few-step generation performance on both ImageNet 64x64 and 512x512, using small and efficient neural networks. Finally, we show text-to-image flow map models that outperform all existing non-adversarially trained few-step samplers in text-conditioned synthesis.


Consistency Purification: Effective and Efficient Diffusion Purification towards Certified Robustness

Neural Information Processing Systems

Diffusion Purification, purifying noised images with diffusion models, has been widely used for enhancing certified robustness via randomized smoothing. However, existing frameworks often grapple with the balance between efficiency and effectiveness. While the Denoising Diffusion Probabilistic Model (DDPM) offers an efficient single-step purification, it falls short in ensuring purified images reside on the data manifold. Conversely, the Stochastic Diffusion Model effectively places purified images on the data manifold but demands solving cumbersome stochastic differential equations, while its derivative, the Probability Flow Ordinary Differential Equation (PF-ODE), though solving simpler ordinary differential equations, still requires multiple computational steps. In this work, we demonstrated that an ideal purification pipeline should generate the purified images on the data manifold that are as much semantically aligned to the original images for effectiveness in one step for efficiency. Therefore, we introduced Consistency Purification, an efficiency-effectiveness Pareto superior purifier compared to the previous work.


Probability-Flow ODE in Infinite-Dimensional Function Spaces

Na, Kunwoo, Lee, Junghyun, Yun, Se-Young, Lim, Sungbin

arXiv.org Machine Learning

Diffusion model (Sohl-Dickstein et al., 2015; Ho et al., 2020; Song et al., 2021b; Kingma et al., 2021) is a class of generative model that adds noise to real data to train the score network and sequentially approximate the time-reversed process (Föllmer and Wakolbinger, 1986; Anderson, 1982) to generate samples from the true data distribution. This model has shown remarkable empirical success in numerous domains such as image generation (Song et al., 2021b,a), video generation (Luo et al., 2023), medical data processing (Song et al., 2022; Chung and Ye, 2022; Akrout et al., 2023), and audio generation (Kong et al., 2020). However, "classical" diffusion models formulated on finite-dimensional Euclidean spaces limit their applicability to function generation problems as they can only generate function values realized on a fixed discretization of the function's domain (Li et al., 2020) and cannot capture functional properties of a data such as integrability or smoothness (Kerrigan et al., 2023). Motivated by such a limitation of finite-dimensional models, there has been a line of works extending the finite-dimensional diffusion model to infinite-dimensional Hilbert spaces; for instance, Hagemann et al. (2023); Kerrigan et al. (2023); Lim et al. (2023a,b); Pidstrigach et al. (2023); Phillips et al. (2022); Baldassari et al. (2023). Kerrigan et al. (2023) proposes a discrete-time model that serves as an analog of Ho et al. (2020) in infinite-dimensional space, and Hagemann et al. (2023) introduces a finite-dimensional approximation of an infinite-dimensional SDEs and utilizes the time-reversal formula in finite-dimensional spaces. Lim et al. (2023a); Franzese et al. (2023); Pidstrigach et al. (2023) propose continuous-time models by extending the SDE framework of Song et al. (2021b) to infinite dimensions based on semigroup theory (ref. Da Prato and Zabczyk (2014)); however, their consideration is limited to a relatively simple class of SDEs, such as Langevin type SDE or SDEs with constant-time diffusion coefficients. Later, Lim et al. (2023b) proved a general form of time-reversal formula which encompasses various choices of SDEs such as VPSDE, VESDE, sub-VPSDE (Song et al., 2021b) and variance scheduling (Nichol and


Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

Kelvinius, Filip Ekström, Zhao, Zheng, Lindsten, Fredrik

arXiv.org Machine Learning

Previous methods for posterior sampling A recent line of research has exploited pre-trained generative with diffusion priors, while providing impressive diffusion models as priors for solving Bayesian results on tasks like image reconstruction (Kawar et al., inverse problems. We contribute to this research direction 2022; Chung et al., 2023; Song et al., 2023), often rely by designing a sequential Monte Carlo method on approximations and fail or perform poorly on simple for linear-Gaussian inverse problems which builds on tasks (Cardoso et al., 2024, and our Section 5.1), "decoupled diffusion", where the generative process is making it uncertain to what extent they can solve designed such that larger updates to the sample are Bayesian inference problems in general.


Consistent Flow Distillation for Text-to-3D Generation

Yan, Runjie, Chen, Yinbo, Wang, Xiaolong

arXiv.org Artificial Intelligence

Score Distillation Sampling (SDS) has made significant strides in distilling image-generative models for 3D generation. However, its maximum-likelihood-seeking behavior often leads to degraded visual quality and diversity, limiting its effectiveness in 3D applications. In this work, we propose Consistent Flow Distillation (CFD), which addresses these limitations. We begin by leveraging the gradient of the diffusion ODE or SDE sampling process to guide the 3D generation. From the gradient-based sampling perspective, we find that the consistency of 2D image flows across different viewpoints is important for high-quality 3D generation. To achieve this, we introduce multi-view consistent Gaussian noise on the 3D object, which can be rendered from various viewpoints to compute the flow gradient. Our experiments demonstrate that CFD, through consistent flows, significantly outperforms previous methods in text-to-3D generation.