probability measure
On the Stability of Spherical Hellinger-Kantorovich Flows and Their Implications for Differential Privacy
Mustafi, Aratrika, Mukherjee, Soumya
We consider the problem of sampling from an unnormalized Boltzmann/ Gibbs density, π(θ) exp V(θ),θ Θ Rd, where the normalization constant is unknown (and/or intractable) and only the potential function V (and typically its derivatives) can be evaluated. This problem arises across various domains in Bayesian inference, statistical physics, and modern machine learning. A common variational perspective on sampling is to characterize the target distribution π as the unique minimizer of a functional (typically a divergence functional) over the space of probability measures. From this viewpoint, sampling can be formulated as evolving an initial distribution ρ0 toward π via the gradient flow of this functional under a suitable geometric structure on the space of probability measures. In this paper, we focus on a gradient flow based sampling methodology built from the spherical Hellinger Kantorovich (SHK), also known as the Wasserstein Fisher Rao (WFR), geometry on the space of probability measures (Kondratyev and Vorotnikov, 2019; Liero et al., 2018; Chizat et al., 2015). When the variational objective is the exclusive KL divergence ρ 7 KL(ρ π), the SHK gradient flow generates a time-indexed family of marginals {ρt}t 0 (initialized at ρ0 P2(Θ)) that evolves according to the continuity reaction equation (4). This evolution is equivalent to the birth-death Langevin dynamics introduced in Lu et al. (2019) .
Sliced-Regularized Optimal Transport
We propose a new regularized optimal transport (OT) formulation, termed sliced-regularized optimal transport (SROT). Unlike entropic OT (EOT), which regularizes the transport plan toward an independent coupling, SROT regularizes it toward a smoothened sliced OT (SOT) plan. To the best of our knowledge, SROT is the first approach to leverage a version of SOT plan as a reference to improve classical OT. We provide a formal definition of SROT, derive its dual formulation, and provide a post-Bayesian interpretation of SROT. We then develop a Sinkhorn-style algorithm for efficient computation, retaining the same scalability advantages as EOT. By incorporating a scalable SOT plan as a prior, SROT yields more accurate approximations of the exact OT plan than EOT under the same level of regularization. Moreover, the resulting transport plan improves upon the reference SOT plan itself. We further introduce the corresponding OT divergence induced by SROT, named SROT divergence, and analyze its topological and computational properties. Finally, we validate our approach through experiments on synthetic datasets and color transfer tasks, demonstrating that SROT is better than both EOT and SOT in approximating exact OT. Additional experiments on gradient flows further highlight the advantages of SROT divergence.
On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures
Lin, Likun, Wang, Zhongjian, Xin, Jack, Zhang, Zhiwen
Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.
Optimizing Computational-Statistical Runtime for Wasserstein Distance Estimation
Jacobs, Peter Matthew, Phillips, Jeff M.
Squared Wasserstein distance is a frequently used tool to measure discrepancy between probability distributions. This distance is typically computed between empirical measures of size $n$ from two underlying random samples. Unfortunately, even in lower dimensional Euclidean space problems $\left( d \in \{2,3\} \right)$, algorithms for Wasserstein distance computation with approximate or exact precision guarantees scale poorly in the runtime as a function of $n$ and the desired precision. In response, we consider the computational-statistical runtime, where the goal is to estimate from samples the Wasserstein distance between potentially smooth measures up to $ε$-additive error in expectation with respect to the sampling; we allow $O(1)$ computational cost for collecting a sample. Towards this, we develop a Sample-Sketch-Solve paradigm where we introduce a regular cartesian grid sketch of the samples. We show that (especially under $α$-Hölder smooth distributions) this can compress the data without increasing asymptotic error, and also regularizes the structure which enables faster exact algorithms. Ultimately, we approximate $W_2^2(P,Q)$ within $ε$ error in $ε^{-\max(2,\frac{d+1+o(1)}{1+α})}$ time for $0 < α< 1$ Hölder smooth distributions $P,Q$ on $(0,1)^{d}$; an optimal $Θ(ε^{-2})$ for $α> 1/2$ when $d=2$ and nearly optimal as $α\to 1$ when $d = 3$.
Identifiability and Stability of Generative Drifting with Companion-Elliptic Kernel Families
This paper studies the identifiability and stability of drifting fields within the framework of Generative Modeling via Drifting. The motivating question is whether a zero-drift equilibrium identifies the target distribution, and whether an approximate zero drift implies weak distributional convergence. Since the original drifting model employs the Laplace kernel by default, we first analyze why standard Gaussian score-based arguments fail to apply. This analysis motivates the introduction of companion-elliptic kernel families, which are characterized by a companion potential satisfying an elliptic closure relation. We show that this class naturally contains the Laplace kernel and consists precisely of Gaussian and Matérn kernels with smoothness parameter $ν\ge 1/2$. Within this class, we establish field identifiability for arbitrary Borel probability measures on $\mathbb{R}^d$: if the drifting field vanishes identically, then the two measures must coincide. As for stability, we demonstrate that field convergence alone does not guarantee weak convergence, since mass may escape to infinity while remaining invisible to the field. Although tightness of the sequence directly removes this obstruction and restores weak stability, we prove that, even without tightness, every $C_0$-vague cluster point lies exactly on the defect ray $\{cp:0\le c\le1\}$. Consequently, a single scalar $C_0$-observable suffices to detect the missing mass and recover weak convergence.
Differentially Private Sampling from Distributions via Wasserstein Projection
Takakura, Shokichi, Liew, Seng Pei, Hasegawa, Satoshi
In this paper, we study the problem of sampling from a distribution under the constraint of differential privacy (DP). Prior works measure the utility of DP sampling with density ratio-based measures such as KL divergence. However, such formulations suffer from two key limitations: 1) they fail to capture the geometric structure of the support, and 2) they are not applicable when the supports of the distributions differ. To deal with these issues, we develop a novel framework for DP sampling with Wasserstein distance as the utility measure. In this formulation, we propose Wasserstein Projection Mechanism (WPM), a minimax optimal mechanism based on Wasserstein projection. Furthermore, we develop efficient algorithms for computing the proposed mechanisms approximately and provide convergence guarantees.
Ratio-based Loss Functions
Helgerth, Lena, Christmann, Andreas
Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the specified algorithm. This paper gives a survey of a certain class of loss functions, which we call ratio-based. In supervised learning, margin-based loss functions for classification tasks depending on the product of the output values $y_i$ and the predictions $f(x_i)$ as well as distance-based loss functions depending on the difference of $y_i$ and $f(x_i)$ for regression are common. Distance-based loss functions are in particular useful, if an additive model assumption seems plausible, i.e. the common signal plus noise assumption. However, in the literature, several loss functions proposed for regression purposes have a multiplicative error structure in mind and pay attention to relative errors, i.e. to the ratio of $y_i$ and $f(x_i)$. In this survey article, we systematically investigate such ratio-based loss functions and propose a few new losses, which may be interesting for future research. We concentrate on investigating general properties of ratio-based loss functions like continuity, Lipschitz-continuity, convexity, and differentiability, because these properties play a central role in most machine learning algorithms. Therefore, we do not focus on some specific machine learning algorithm to derive universal consistency, learning rates, or stability results. Instead, we want to enable future research in this direction.
Gaussian mixture models in Hilbert spaces via kernel methods
López-Montero, Daniel, Álvarez-López, Antonio, Matabuena, Marcos
Modern datasets across many disciplines increasingly consist of time-evolving, potentially infinite-dimensional random objects, such as dynamic functional data, which are naturally modeled in Hilbert spaces. In these settings, characterizing probability measures, for example, through densities, can be ill-defined or technically challenging. Motivated by clustering applications, we propose a Gaussian mixture framework for Hilbert-space-valued data based on kernel mean embeddings and develop efficient optimization algorithms for estimation. We establish theoretical guarantees showing that the proposed algorithm is well defined and that the model yields a dense class of approximations in infinite-dimensional spaces. We evaluate the framework through extensive experiments on diverse structures and data geometries, including $L^2$-functional data and random graphs in Laplacian spaces arising in modern medical applications.
Entropic Riemannian Neural Optimal Transport
Micheli, Alessandro, Sapora, Silvia, Monod, Anthea, Bhatt, Samir
Many machine learning problems involve data supported on curved spaces such as spheres, rotation groups, hyperbolic spaces, and general Riemannian manifolds, where Euclidean geometry can distort distances, averages, and the resulting optimal transport (OT) problem. Existing manifold OT methods have pursued amortized out-of-sample maps, while entropic regularization has made discrete OT more scalable, but these advantages have remained largely disjoint. We propose Entropic Riemannian Neural Optimal Transport (Entropic RNOT), a unified framework that combines intrinsic entropic OT with amortized out-of-sample evaluation on Riemannian manifolds. Our method learns a single target-side Schrödinger potential through a neural pullback parameterization, recovers the induced Gibbs coupling, and uses the resulting conditional laws to construct intrinsic transport surrogates. These include barycentric projections on Cartan-Hadamard manifolds and heat-smoothed conditional surrogates on stochastically complete manifolds, the latter turning possibly atomic target laws into absolutely continuous ones. For fixed regularization $\varepsilon>0$, we prove that the proposed hypothesis class recovers the entropic optimal coupling in strong probabilistic metrics. As consequences, barycentric surrogates converge in $L^2$, while heat-smoothed surrogates are stable at fixed heat time and asymptotically unbiased as the heat time vanishes. The guarantees hold for compactly supported data on possibly noncompact manifolds. Empirically, our method matches or improves over Euclidean, tangent-space, and log-Euclidean baselines on benchmarks over $\mathbb{S}^2$, $\mathrm{SO}(3)$, $\mathrm{SPD}(3)$, $\mathrm{SE}(3)$, and $\mathbb{H}^2$, scales favorably relative to discrete manifold Sinkhorn, and in a protein-ligand docking application, refines poses on $\mathrm{SE}(3)$ without retraining or per-instance optimization.