mackey
StochasticSteinDiscrepancies
Stein discrepancies (SDs) monitor convergence andnon-convergence inapprox-imate inference when exact integration and sampling are intractable. However,the computation of a Stein discrepancy can be prohibitive if the Stein operator - often a sum over likelihood terms or potentials - is expensive to evaluate.
Low Stein Discrepancy via Message-Passing Monte Carlo
Kirk, Nathan, Rusch, T. Konstantin, Zech, Jakob, Rus, Daniela
Message-Passing Monte Carlo (MPMC) was recently introduced as a novel low-discrepancy sampling approach leveraging tools from geometric deep learning. While originally designed for generating uniform point sets, we extend this framework to sample from general multivariate probability distributions with known probability density function. Our proposed method, Stein-Message-Passing Monte Carlo (Stein-MPMC), minimizes a kernelized Stein discrepancy, ensuring improved sample quality. Finally, we show that Stein-MPMC outperforms competing methods, such as Stein Variational Gradient Descent and (greedy) Stein Points, by achieving a lower Stein discrepancy.
The Polynomial Stein Discrepancy for Assessing Moment Convergence
Srinivasan, Narayan, Sutton, Matthew, Drovandi, Christopher, South, Leah F
We propose a novel method for measuring the discrepancy between a set of samples and a desired posterior distribution for Bayesian inference. Classical methods for assessing sample quality like the effective sample size are not appropriate for scalable Bayesian sampling algorithms, such as stochastic gradient Langevin dynamics, that are asymptotically biased. Instead, the gold standard is to use the kernel Stein Discrepancy (KSD), which is itself not scalable given its quadratic cost in the number of samples. The KSD and its faster extensions also typically suffer from the curse-of-dimensionality and can require extensive tuning. To address these limitations, we develop the polynomial Stein discrepancy (PSD) and an associated goodness-of-fit test. While the new test is not fully convergence-determining, we prove that it detects differences in the first r moments in the Bernstein-von Mises limit. We empirically show that the test has higher power than its competitors in several examples, and at a lower computational cost. Finally, we demonstrate that the PSD can assist practitioners to select hyper-parameters of Bayesian sampling algorithms more efficiently than competitors.
Improved Finite-Particle Convergence Rates for Stein Variational Gradient Descent
Balasubramanian, Krishnakumar, Banerjee, Sayan, Ghosal, Promit
We provide finite-particle convergence rates for the Stein Variational Gradient Descent (SVGD) algorithm in the Kernel Stein Discrepancy ($\mathsf{KSD}$) and Wasserstein-2 metrics. Our key insight is the observation that the time derivative of the relative entropy between the joint density of $N$ particle locations and the $N$-fold product target measure, starting from a regular initial distribution, splits into a dominant `negative part' proportional to $N$ times the expected $\mathsf{KSD}^2$ and a smaller `positive part'. This observation leads to $\mathsf{KSD}$ rates of order $1/\sqrt{N}$, providing a near optimal double exponential improvement over the recent result by~\cite{shi2024finite}. Under mild assumptions on the kernel and potential, these bounds also grow linearly in the dimension $d$. By adding a bilinear component to the kernel, the above approach is used to further obtain Wasserstein-2 convergence. For the case of `bilinear + Mat\'ern' kernels, we derive Wasserstein-2 rates that exhibit a curse-of-dimensionality similar to the i.i.d. setting. We also obtain marginal convergence and long-time propagation of chaos results for the time-averaged particle laws.
Controlling Moments with Kernel Stein Discrepancies
Kanagawa, Heishiro, Barp, Alessandro, Gretton, Arthur, Mackey, Lester
Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show that standard KSDs used for weak convergence control fail to control moment convergence. To address this limitation, we next provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence. As an immediate consequence we develop, for each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein convergence.
Aim Lab is the virtual gym all the kids attend. It wants to be more than that.
But as the app grew, so too did the vision. Training and warming up was a core part of Aim Lab's appeal. Could data gathered to that end be leveraged for matchmaking, with players' skill profiles used to place them into more satisfying games? There were brief flirtations with fantasy and sportsbetting, which Mackey says "weren't really exciting." In November, Statespace purchased Pro Guides, which operates something like a MasterClass program for esports, as well as coaching that matches professional players with paying clients.
A Fleet of Computers Helps Settle a 90-Year-Old Math Problem
A team of mathematicians has finally finished off Keller's conjecture, but not by working it out themselves. Instead, they taught a fleet of computers to do it for them. Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research develop ments and trends in mathe matics and the physical and life sciences. Keller's conjecture, posed 90 years ago by Ott-Heinrich Keller, is a problem about covering spaces with identical tiles. It asserts that if you cover a two-dimensional space with two-dimensional square tiles, at least two of the tiles must share an edge.