proximal sampler
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
Functional Stochastic Localization
Gu, Anming, Shi, Bobby, Tian, Kevin
Eldan's stochastic localization is a probabilistic construction that has proved instrumental to modern breakthroughs in high-dimensional geometry and the design of sampling algorithms. Motivated by sampling under non-Euclidean geometries and the mirror descent algorithm in optimization, we develop a functional generalization of Eldan's process that replaces Gaussian regularization with regularization by any positive integer multiple of a log-Laplace transform. We further give a mixing time bound on the Markov chain induced by our localization process, which holds if our target distribution satisfies a functional Poincaré inequality. Finally, we apply our framework to differentially private convex optimization in $\ell_p$ norms for $p \in [1, 2)$, where we improve state-of-the-art query complexities in a zeroth-order model.
- North America > United States > Texas > Travis County > Austin (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- (2 more...)
- Research Report > Experimental Study (1.00)
- Research Report > New Finding (0.67)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)
A transport approach to the cutoff phenomenon
Pedrotti, Francesco, Salez, Justin
Substantial progress has recently been made in the understanding of the cutoff phenomenon for Markov processes, using an information-theoretic statistics known as varentropy [Sal23; Sal24; Sal25a; PS25]. In the present paper, we propose an alternative approach which bypasses the use of varentropy and exploits instead a new W-TV transport inequality, combined with a classical parabolic regularization estimate [BGL01; OV01]. While currently restricted to non-negatively curved processes on smooth spaces, our argument no longer requires the chain rule, nor any approximate version thereof. As applications, we recover the main result of [Sal25a] establishing cutoff for the log-concave Langevin dynamics, and extend the conclusion to a widely-used discrete-time sampling algorithm known as the Proximal Sampler.
Mixing Time of the Proximal Sampler in Relative Fisher Information via Strong Data Processing Inequality
We study the mixing time guarantee for sampling in relative Fisher information via the Proximal Sampler algorithm, which is an approximate proximal discretization of the Langevin dynamics. We show that when the target probability distribution is strongly log-concave, the relative Fisher information converges exponentially fast along the Proximal Sampler; this matches the exponential convergence rate of the relative Fisher information along the continuous-time Langevin dynamics for strongly log-concave target. When combined with a standard implementation of the Proximal Sampler via rejection sampling, this exponential convergence rate provides a high-accuracy iteration complexity guarantee for the Proximal Sampler in relative Fisher information when the target distribution is strongly log-concave and log-smooth. Our proof proceeds by establishing a strong data processing inequality for relative Fisher information along the Gaussian channel under strong log-concavity, and a data processing inequality along the reverse Gaussian channel for a special distribution. The forward and reverse Gaussian channels compose to form the Proximal Sampler, and these data processing inequalities imply the exponential convergence rate of the relative Fisher information along the Proximal Sampler.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Connecticut > New Haven County > New Haven (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
From Optimization to Sampling via Lyapunov Potentials
Chen, August Y., Sridharan, Karthik
We study the problem of sampling from high-dimensional distributions using Langevin Dynamics, a natural and popular variant of Gradient Descent where at each step, appropriately scaled Gaussian noise is added. The similarities between Langevin Dynamics and Gradient Flow and Gradient Descent leads to the natural question: if the distribution's log-density can be optimized from all initializations via Gradient Flow and Gradient Descent, given oracle access to the gradients, can we efficiently sample from the distribution using discrete-time Langevin Dynamics? We answer this question in the affirmative for distributions that are unimodal in a particular sense, at low but appropriate temperature levels natural in the context of both optimization and real-world applications, under mild regularity assumptions on the measure and the convergence rate of Gradient Flow. We do so by using the results of De Sa, Kale, Lee, Sekhari, and Sridharan (2022) that the success of optimization implies particular geometric properties involving a \textit{Lyapunov Potential}. These geometric properties from optimization in turn give us strong quantitative control over isoperimetric constants of the measure. As a corollary, we show we can efficiently sample from several new natural and interesting classes of non-log-concave densities, an important setting where we have relatively few examples. Another corollary is efficient discrete-time sampling results for log-concave measures satisfying milder regularity conditions than smoothness, results similar to the work of Lehec (2023).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Russia (0.04)
- (2 more...)
- Workflow (0.66)
- Research Report (0.50)
Fast Convergence of $\Phi$-Divergence Along the Unadjusted Langevin Algorithm and Proximal Sampler
Mitra, Siddharth, Wibisono, Andre
We study the mixing time of two popular discrete time Markov chains in continuous space, the unadjusted Langevin algorithm and the proximal sampler, which are discretizations of the Langevin dynamics. We extend mixing time analyses for these Markov chains to hold in $\Phi$-divergence. We show that any $\Phi$-divergence arising from a twice-differentiable strictly convex function $\Phi$ converges to $0$ exponentially fast along these Markov chains, under the assumption that their stationary distributions satisfies the corresponding $\Phi$-Sobolev inequality. Our rates of convergence are tight and include as special cases popular mixing time regimes, namely the mixing in chi-squared divergence under a Poincar\'e inequality, and the mixing in relative entropy under a log-Sobolev inequality. Our results follow by bounding the contraction coefficients arising in the appropriate strong data processing inequalities.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- Europe > Hungary > Budapest > Budapest (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.75)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
R\'enyi-infinity constrained sampling with $d^3$ membership queries
Kook, Yunbum, Zhang, Matthew S.
Uniform sampling over a convex body is a fundamental algorithmic problem, yet the convergence in KL or R\'enyi divergence of most samplers remains poorly understood. In this work, we propose a constrained proximal sampler, a principled and simple algorithm that possesses elegant convergence guarantees. Leveraging the uniform ergodicity of this sampler, we show that it converges in the R\'enyi-infinity divergence ($\mathcal R_\infty$) with no query complexity overhead when starting from a warm start. This is the strongest of commonly considered performance metrics, implying rates in $\{\mathcal R_q, \mathsf{KL}\}$ convergence as special cases. By applying this sampler within an annealing scheme, we propose an algorithm which can approximately sample $\varepsilon$-close to the uniform distribution on convex bodies in $\mathcal R_\infty$-divergence with $\widetilde{\mathcal{O}}(d^3\, \text{polylog} \frac{1}{\varepsilon})$ query complexity. This improves on all prior results in $\{\mathcal R_q, \mathsf{KL}\}$-divergences, without resorting to any algorithmic modifications or post-processing of the sample. It also matches the prior best known complexity in total variation distance.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Montenegro (0.04)
- Asia > Middle East > Jordan (0.04)
A Separation in Heavy-Tailed Sampling: Gaussian vs. Stable Oracles for Proximal Samplers
He, Ye, Mousavi-Hosseini, Alireza, Balasubramanian, Krishnakumar, Erdogdu, Murat A.
We study the complexity of heavy-tailed sampling and present a separation result in terms of obtaining high-accuracy versus low-accuracy guarantees i.e., samplers that require only $O(\log(1/\varepsilon))$ versus $\Omega(\text{poly}(1/\varepsilon))$ iterations to output a sample which is $\varepsilon$-close to the target in $\chi^2$-divergence. Our results are presented for proximal samplers that are based on Gaussian versus stable oracles. We show that proximal samplers based on the Gaussian oracle have a fundamental barrier in that they necessarily achieve only low-accuracy guarantees when sampling from a class of heavy-tailed targets. In contrast, proximal samplers based on the stable oracle exhibit high-accuracy guarantees, thereby overcoming the aforementioned limitation. We also prove lower bounds for samplers under the stable oracle and show that our upper bounds cannot be fundamentally improved.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- (2 more...)
Faster Sampling via Stochastic Gradient Proximal Sampler
Huang, Xunpeng, Zou, Difan, Ma, Yi-An, Dong, Hanze, Zhang, Tong
Stochastic gradients have been widely integrated into Langevin-based methods to improve their scalability and efficiency in solving large-scale sampling problems. However, the proximal sampler, which exhibits much faster convergence than Langevin-based algorithms in the deterministic setting Lee et al. (2021), has yet to be explored in its stochastic variants. In this paper, we study the Stochastic Proximal Samplers (SPS) for sampling from non-log-concave distributions. We first establish a general framework for implementing stochastic proximal samplers and establish the convergence theory accordingly. We show that the convergence to the target distribution can be guaranteed as long as the second moment of the algorithm trajectory is bounded and restricted Gaussian oracles can be well approximated. We then provide two implementable variants based on Stochastic gradient Langevin dynamics (SGLD) and Metropolis-adjusted Langevin algorithm (MALA), giving rise to SPS-SGLD and SPS-MALA. We further show that SPS-SGLD and SPS-MALA can achieve $\epsilon$-sampling error in total variation (TV) distance within $\tilde{\mathcal{O}}(d\epsilon^{-2})$ and $\tilde{\mathcal{O}}(d^{1/2}\epsilon^{-2})$ gradient complexities, which outperform the best-known result by at least an $\tilde{\mathcal{O}}(d^{1/3})$ factor. This enhancement in performance is corroborated by our empirical studies on synthetic data with various dimensions, demonstrating the efficiency of our proposed algorithm.
- Asia > Middle East > Jordan (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)