sde
Symbolic Density Estimation for Discrete Distributions
Discrete probability laws underpin statistical modeling, yet the catalog of interpretable distributions has expanded only gradually through centuries of case-by-case mathematical derivations. We introduce symbolic density estimation (SDE), an unsupervised framework that automatically recovers closed-form probability mass functions by composing elementary analytic operations within a structured search space. Our method integrates domain-specific structural priors with evolutionary search and a validity-aware inference stage, and it extends to richer distribution families such as zero inflation and finite mixtures. To support systematic evaluation and future research, we contribute a benchmark dataset spanning a broad collection of commonly used discrete distributions. The proposed algorithm recovers all benchmark families with accurate parameter estimates. A real data application shows that it identifies concise and interpretable mixture models that improve goodness-of-fit over standard models.
A note on connections between the Fรถllmer process and the denoising diffusion probabilistic model
The Fรถllmer process is a Brownian motion conditioned to have a pre-specified distribution at time 1. This process can be interpreted as an "augmented" time-compressed version of the reverse stochastic differential equation (SDE) for the denoising diffusion probabilistic model (DDPM). While this fact has been indirectly used to analyze DDPM sampling errors via discretization of the reverse SDE, connections between direct discretization of the Fรถllmer process and the DDPM sampler have not yet been fully explored. This note aims to clarify this point while surveying relevant results from existing work. We show that discretized Fรถllmer processes give natural hyper-parameter settings of the DDPM sampler. Moreover, this allows us to systematically recover state-of-the-art results on DDPM sampling error bounds with slight improvements.
Efficient Sampling on Riemannian Manifolds via Langevin MCMC
We study the task of efficiently sampling from a Gibbs distribution dฯ = e hdvolg over a Riemannian manifold M via (geometric) Langevin MCMC; this algorithm involves computing exponential maps in random Gaussian directions and is efficiently implementable in practice. The key to our analysis of Langevin MCMC is a bound on the discretization error of the geometric Euler-Murayama scheme, assuming his Lipschitz and M has bounded sectional curvature. Our error bound matches the error of Euclidean Euler-Murayama in terms of its stepsize dependence. Combined with a contraction guarantee for the geometric Langevin Diffusion under Kendall-Cranston coupling, we prove that the Langevin MCMC iterates lie within ฮต-Wasserstein distance of ฯ after O(ฮต 2)steps, which matches the iteration complexity for Euclidean Langevin MCMC. Our results apply in general settings where hcan be nonconvex and M can have negative Ricci curvature. Under additional assumptions that the Riemannian curvature tensor has bounded derivatives, and that ฯ satisfies a CD(,) condition, we analyze the stochastic gradient version of Langevin MCMC, and bound its iteration complexity by O(ฮต 2)as well.
Maximum Likelihood Training of Score-Based Diffusion Models
Score-based diffusion models synthesize samples by reversing a stochastic process that diffuses data to noise, and are trained by minimizing a weighted combination of score matching losses. The log-likelihood of score-based diffusion models can be tractably computed through a connection to continuous normalizing flows, but log-likelihood is not directly optimized by the weighted combination of score matching losses. We show that for a specific weighting scheme, the objective upper bounds the negative log-likelihood, thus enabling approximate maximum likelihood training of score-based diffusion models. We empirically observe that maximum likelihood training consistently improves the likelihood of score-based diffusion models across multiple datasets, stochastic processes, and model architectures. Our best models achieve negative log-likelihoods of 2.83 and 3.76 bits/dim on CIFAR-10 and ImageNet 32 ห32 without any data augmentation, on a par with state-of-the-art autoregressive models on these tasks.
Scalable Inference in SDEs by Direct Matching of the Fokker-Planck-Kolmogorov Equation
Simulation-based techniques such as variants of stochastic Runge-Kutta are thede facto approach for inference with stochastic differential equations (SDEs) in machine learning. These methods are general-purpose and used with parametric and non-parametric models, and neural SDEs. Stochastic Runge-Kutta relies on the use of sampling schemes that can be inefficient in high dimensions. We addressthis issue by revisiting the classical SDE literature and derive direct approximations to the (typically intractable) Fokker-Planck-Kolmogorov equation by matchingmoments. We show how this workflow is fast, scales to high-dimensional latent spaces, and is applicable to scarce-data applications, where a non-parametric SDE with a driving Gaussian process velocity field specifies the model.