picard iteration
Parallel Sampling of Diffusion Models
Diffusion models are powerful generative models but suffer from slow sampling, often taking 1000 sequential denoising steps for one sample. As a result, considerable efforts have been directed toward reducing the number of denoising steps, but these methods hurt sample quality. Instead of reducing the number of denoising steps (trading quality for speed), in this paper we explore an orthogonal approach: can we run the denoising steps in parallel (trading compute for speed)? In spite of the sequential nature of the denoising steps, we show that surprisingly it is possible to parallelize sampling via Picard iterations, by guessing the solution of future denoising steps and iteratively refining until convergence. With this insight, we present ParaDiGMS, a novel method to accelerate the sampling of pretrained diffusion models by denoising multiple steps in parallel. ParaDiGMS is the first diffusion sampling method that enables trading compute for speed and is even compatible with existing fast sampling techniques such as DDIM and DPMSolver. Using ParaDiGMS, we improve sampling speed by 2-4x across a range of robotics and image generation models, giving state-of-the-art sampling speeds of 0.2s on 100-step DiffusionPolicy and 14.6s on 1000-step StableDiffusion-v2 with no measurable degradation of task reward, FID score, or CLIP score.1
ARelated Work
We note that these results are about two of the most commonly used architecture modifications for RNNs. First, the gating mechanism is ubiquitous in RNNs, and usually thought of as a heuristic for smoothing optimization [28]. Second, many of the effective large-scale RNNs use linear (gated) recurrences and deeper models, which is usually thought of as a heuristic for computational efficiency [5]. Our results suggest that neither of these are heuristics after all, and arise from standard ways to approximate ODEs. To be more specific, we show that: 19 Table 6: A summary of the characteristics of popular RNN methods and their approximation mechanisms for capturing the dynamics x(t) = x(t) + f(t,x(t)) (equation (14)). The LSSL entries are for the very specific case with order N = 1 and A= 1,B = 1,C = 1,D= 0; LSSLs are more general.
The Picard-Lagrange Framework for Higher-Order Langevin Monte Carlo
Mahajan, Jaideep, Zhang, Kaihong, Liang, Feng, Liu, Jingbo
Sampling from log-concave distributions is a central problem in statistics and machine learning. Prior work establishes theoretical guarantees for Langevin Monte Carlo algorithm based on overdamped and underdamped Langevin dynamics and, more recently, some third-order variants. In this paper, we introduce a new sampling algorithm built on a general $K$th-order Langevin dynamics, extending beyond second- and third-order methods. To discretize the $K$th-order dynamics, we approximate the drift induced by the potential via Lagrange interpolation and refine the node values at the interpolation points using Picard-iteration corrections, yielding a flexible scheme that fully utilizes the acceleration of higher-order Langevin dynamics. For targets with smooth, strongly log-concave densities, we prove dimension-dependent convergence in Wasserstein distance: the sampler achieves $\varepsilon$-accuracy within $\widetilde O(d^{\frac{K-1}{2K-3}}\varepsilon^{-\frac{2}{2K-3}})$ gradient evaluations for $K \ge 3$. To our best knowledge, this is the first sampling algorithm achieving such query complexity. The rate improves with the order $K$ increases, yielding better rates than existing first to third-order approaches.
A Unifying Framework for Parallelizing Sequential Models with Linear Dynamical Systems
Gonzalez, Xavier, Buchanan, E. Kelly, Lee, Hyun Dong, Liu, Jerry Weihong, Wang, Ke Alexander, Zoltowski, David M., Ré, Christopher, Linderman, Scott W.
Harnessing parallelism in seemingly sequential models is a central challenge for modern machine learning. Several approaches have been proposed for evaluating sequential processes in parallel using fixed-point methods, like Newton, Picard, and Jacobi iterations. In this work, we show that these methods can be understood within a common framework based on linear dynamical systems (LDSs), where different iteration schemes arise naturally as approximate linearizations of a nonlinear recursion. This unifying view highlights shared principles behind these techniques and clarifies when particular fixed-point methods are most likely to be effective. By bridging diverse algorithms through the language of LDSs, our framework provides a clearer theoretical foundation for parallelizing sequential models and points toward new opportunities for efficient and scalable computation.
Parallel Sampling of Diffusion Models on $SO(3)$
Chen, Yan-Ting, Chen, Hao-Wei, Hsiao, Tsu-Ching, Lee, Chun-Yi
In this paper, we design an algorithm to accelerate the diffusion process on the $SO(3)$ manifold. The inherently sequential nature of diffusion models necessitates substantial time for denoising perturbed data. To overcome this limitation, we proposed to adapt the numerical Picard iteration for the $SO(3)$ space. We demonstrate our algorithm on an existing method that employs diffusion models to address the pose ambiguity problem. Moreover, we show that this acceleration advantage occurs without any measurable degradation in task reward. The experiments reveal that our algorithm achieves a speed-up of up to 4.9$\times$, significantly reducing the latency for generating a single sample.
You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts
Dou, Hongkun, Li, Zeyu, Jiang, Xingyu, Li, Hongjue, Yang, Lijun, Yao, Wen, Deng, Yue
Diffusion models (DMs) have recently demonstrated remarkable success in modeling large-scale data distributions. However, many downstream tasks require guiding the generated content based on specific differentiable metrics, typically necessitating backpropagation during the generation process. This approach is computationally expensive, as generating with DMs often demands tens to hundreds of recursive network calls, resulting in high memory usage and significant time consumption. In this paper, we propose a more efficient alternative that approaches the problem from the perspective of parallel denoising. We show that full backpropagation throughout the entire generation process is unnecessary. The downstream metrics can be optimized by retaining the computational graph of only one step during generation, thus providing a shortcut for gradient propagation. The resulting method, which we call Shortcut Diffusion Optimization (SDO), is generic, high-performance, and computationally lightweight, capable of optimizing all parameter types in diffusion sampling. We demonstrate the effectiveness of SDO on several real-world tasks, including controlling generation by optimizing latent and aligning the DMs by fine-tuning network parameters. Compared to full backpropagation, our approach reduces computational costs by $\sim 90\%$ while maintaining superior performance. Code is available at https://github.com/deng-ai-lab/SDO.
Parallel simulation for sampling under isoperimetry and score-based diffusion models
Zhou, Huanjian, Sugiyama, Masashi
In recent years, there has been a surge of interest in proving discretization bounds for sampling under isoperimetry and for diffusion models. As data size grows, reducing the iteration cost becomes an important goal. Inspired by the great success of the parallel simulation of the initial value problem in scientific computation, we propose parallel Picard methods for sampling tasks. Rigorous theoretical analysis reveals that our algorithm achieves better dependence on dimension $d$ than prior works in iteration complexity (i.e., reduced from $\widetilde{O}(\log^2 d)$ to $\widetilde{O}(\log d)$), which is even optimal for sampling under isoperimetry with specific iteration complexity. Our work highlights the potential advantages of simulation methods in scientific computation for dynamics-based sampling and diffusion models.