discretization
Lipschitz regularity in Flow Matching and Diffusion Models: sharp sampling rates and functional inequalities
Under general assumptions on the target distribution $p^\star$, we establish a sharp Lipschitz regularity theory for flow-matching vector fields and diffusion-model scores, with optimal dependence on time and dimension. As applications, we obtain Wasserstein discretization bounds for Euler-type samplers in dimension $d$: with $N$ discretization steps, the error achieves the optimal rate $\sqrt{d}/N$ up to logarithmic factors. Moreover, the constants do not deteriorate exponentially with the spatial extent of $p^\star$. We also show that the one-sided Lipschitz control yields a globally Lipschitz transport map from the standard Gaussian to $p^\star$, which implies Poincaré and log-Sobolev inequalities for a broad class of probability measures.
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- Europe > United Kingdom (0.04)
- Europe > France (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Unfolding with a Wasserstein Loss
Craig, Katy, Faktor, Benjamin, Nachman, Benjamin
Data unfolding -- the removal of noise or artifacts from measurements -- is a fundamental task across the experimental sciences. Of particular interest in the present work are applications of data unfolding in physics, in which context the dominant approach is RichardsonLucy (RL) deconvolution. The classical RL approach aims to find denoised data that, once passed through the noise model, is as close as possible to the measured data, in terms of Kullback-Leibler (KL) divergence. Fundamental to this approach is the hypothesis that the support of the measured data overlaps with the output of the noise model, so that the KL divergence correctly captures their similarity. In practice, this hypothesis is typically enforced by binning the measured data and noise model, introducing numerical error into the unfolding process. As a counterpoint to classical binned methods for unfolding, the present work studies an alternative formulation of the unfolding problem, using a Wasserstein loss instead of the KL divergence to quantify the similarity between the measured data and the output of the noise model. We establish sharp conditions for existence and uniqueness of optimizers; as a consequence we answer open questions of Li, et al. [23], regarding necessary conditions for existence and uniqueness in the case of transport map noise models. Following these theoretical results, we then develop a provably convergent generalized Sinkhorn algorithm to compute approximate optimizers. Our algorithm requires only empirical observations of the noise model and measured data and scales with the size of the data, rather than the ambient dimension.
SympFormer: Accelerated attention blocks via Inertial Dynamics on Density Manifolds
Stein, Viktor, Li, Wuchen, Steidl, Gabriele
Transformers owe much of their empirical success in natural language processing to the self-attention blocks. Recent perspectives interpret attention blocks as interacting particle systems, whose mean-field limits correspond to gradient flows of interaction energy functionals on probability density spaces equipped with Wasserstein-$2$-type metrics. We extend this viewpoint by introducing accelerated attention blocks derived from inertial Nesterov-type dynamics on density spaces. In our proposed architecture, tokens carry both spatial (feature) and velocity variables. The time discretization and the approximation of accelerated density dynamics yield Hamiltonian momentum attention blocks, which constitute the proposed accelerated attention architectures. In particular, for linear self-attention, we show that the attention blocks approximate a Stein variational gradient flow, using a bilinear kernel, of a potential energy. In this setting, we prove that elliptically contoured probability distributions are preserved by the accelerated attention blocks. We present implementable particle-based algorithms and demonstrate that the proposed accelerated attention blocks converge faster than the classical attention blocks while preserving the number of oracle calls.
- North America > United States > South Carolina > Richland County > Columbia (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Switzerland (0.04)
- (3 more...)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > Canada (0.04)
- Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Finland > Uusimaa > Helsinki (0.04)
- North America > United States > South Dakota > Minnehaha County > Sioux Falls (0.04)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (3 more...)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Alberta (0.14)
- Asia > Middle East > Jordan (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Massachusetts (0.04)
- (3 more...)
- Research Report > Experimental Study (0.92)
- Instructional Material > Course Syllabus & Notes (0.67)
- Overview (0.67)