Not enough data to create a plot.
Try a different view from the menu above.
Nüsken, Nikolas
Conditioning Diffusions Using Malliavin Calculus
Pidstrigach, Jakiw, Baker, Elizabeth, Domingo-Enrich, Carles, Deligiannidis, George, Nüsken, Nikolas
In stochastic optimal control and conditional generative modelling, a central computational task is to modify a reference diffusion process to maximise a given terminal-time reward. Most existing methods require this reward to be differentiable, using gradients to steer the diffusion towards favourable outcomes. However, in many practical settings, like diffusion bridges, the reward is singular, taking an infinite value if the target is hit and zero otherwise. We introduce a novel framework, based on Malliavin calculus and path-space integration by parts, that enables the development of methods robust to such singular rewards. This allows our approach to handle a broad range of applications, including classification, diffusion bridges, and conditioning without the need for artificial observational noise. We demonstrate that our approach offers stable and reliable training, outperforming existing techniques.
Transport meets Variational Inference: Controlled Monte Carlo Diffusions
Vargas, Francisco, Padhy, Shreyas, Blessing, Denis, Nüsken, Nikolas
Connecting optimal transport and variational inference, we present a principled and systematic framework for sampling and generative modelling centred around divergences on path space. Our work culminates in the development of the \emph{Controlled Monte Carlo Diffusion} sampler (CMCD) for Bayesian computation, a score-based annealing technique that crucially adapts both forward and backward dynamics in a diffusion model. On the way, we clarify the relationship between the EM-algorithm and iterative proportional fitting (IPF) for Schr{\"o}dinger bridges, deriving as well a regularised objective that bypasses the iterative bottleneck of standard IPF-updates. Finally, we show that CMCD has a strong foundation in the Jarzinsky and Crooks identities from statistical physics, and that it convincingly outperforms competing approaches across a wide array of experiments.
From continuous-time formulations to discretization schemes: tensor trains and robust regression for BSDEs and parabolic PDEs
Richter, Lorenz, Sallandt, Leon, Nüsken, Nikolas
The numerical approximation of partial differential equations (PDEs) poses formidable challenges in high dimensions since classical grid-based methods suffer from the so-called curse of dimensionality. Recent attempts rely on a combination of Monte Carlo methods and variational formulations, using neural networks for function approximation. Extending previous work (Richter et al., 2021), we argue that tensor trains provide an appealing framework for parabolic PDEs: The combination of reformulations in terms of backward stochastic differential equations and regression-type methods holds the promise of leveraging latent low-rank structures, enabling both compression and efficient computation. Emphasizing a continuous-time viewpoint, we develop iterative schemes, which differ in terms of computational efficiency and robustness. We demonstrate both theoretically and numerically that our methods can achieve a favorable trade-off between accuracy and computational efficiency. While previous methods have been either accurate or fast, we have identified a novel numerical strategy that can often combine both of these aspects.
Bayesian Learning via Neural Schr\"odinger-F\"ollmer Flows
Vargas, Francisco, Ovsianas, Andrius, Fernandes, David, Girolami, Mark, Lawrence, Neil D., Nüsken, Nikolas
In this work we explore a new framework for approximate Bayesian inference in large datasets based on stochastic control. We advocate stochastic control as a finite time and low variance alternative to popular steady-state methods such as stochastic gradient Langevin dynamics (SGLD). Furthermore, we discuss and adapt the existing theoretical guarantees of this framework and establish connections to already existing VI routines in SDE-based models.
Interpolating between BSDEs and PINNs -- deep learning for elliptic and parabolic boundary value problems
Nüsken, Nikolas, Richter, Lorenz
Solving high-dimensional partial differential equations is a recurrent challenge in economics, science and engineering. In recent years, a great number of computational approaches have been developed, most of them relying on a combination of Monte Carlo sampling and deep learning based approximation. For elliptic and parabolic problems, existing methods can broadly be classified into those resting on reformulations in terms of $\textit{backward stochastic differential equations}$ (BSDEs) and those aiming to minimize a regression-type $L^2$-error ($\textit{physics-informed neural networks}$, PINNs). In this paper, we review the literature and suggest a methodology based on the novel $\textit{diffusion loss}$ that interpolates between BSDEs and PINNs. Our contribution opens the door towards a unified understanding of numerical approaches for high-dimensional PDEs, as well as for implementations that combine the strengths of BSDEs and PINNs. We also provide generalizations to eigenvalue problems and perform extensive numerical studies, including calculations of the ground state for nonlinear Schr\"odinger operators and committor functions relevant in molecular dynamics.
Stein Variational Gradient Descent: many-particle and long-time asymptotics
Nüsken, Nikolas, Renger, D. R. Michiel
Stein variational gradient descent (SVGD) refers to a class of methods for Bayesian inference based on interacting particle systems. In this paper, we consider the originally proposed deterministic dynamics as well as a stochastic variant, each of which represent one of the two main paradigms in Bayesian computational statistics: variational inference and Markov chain Monte Carlo. As it turns out, these are tightly linked through a correspondence between gradient flow structures and large-deviation principles rooted in statistical physics. To expose this relationship, we develop the cotangent space construction for the Stein geometry, prove its basic properties, and determine the large-deviation functional governing the many-particle limit for the empirical measure. Moreover, we identify the Stein-Fisher information (or kernelised Stein discrepancy) as its leading order contribution in the long-time and many-particle regime in the sense of $\Gamma$-convergence, shedding some light on the finite-particle properties of SVGD. Finally, we establish a comparison principle between the Stein-Fisher information and RKHS-norms that might be of independent interest.
Solving high-dimensional parabolic PDEs using the tensor train format
Richter, Lorenz, Sallandt, Leon, Nüsken, Nikolas
Many of the suggested High-dimensional partial differential equations algorithms perform remarkably well in practice and (PDEs) are ubiquitous in economics, science and some theoretical results proving beneficial approximation engineering. However, their numerical treatment properties of neural networks in the PDE setting are now poses formidable challenges since traditional gridbased available (Jentzen et al., 2018). Still, a complete picture methods tend to be frustrated by the curse of remains elusive, and the optimization aspect in particular dimensionality. In this paper, we argue that tensor continues to pose challenging and mostly open problems, trains provide an appealing approximation framework both in terms of efficient implementations and theoretical for parabolic PDEs: the combination of reformulations understanding. Most importantly for practical applications, in terms of backward stochastic differential neural network training using gradient descent type schemes equations and regression-type methods may often take a very long time to converge for complicated in the tensor format holds the promise of leveraging PDE problems.
VarGrad: A Low-Variance Gradient Estimator for Variational Inference
Richter, Lorenz, Boustati, Ayman, Nüsken, Nikolas, Ruiz, Francisco J. R., Akyildiz, Ömer Deniz
We analyse the properties of an unbiased gradient estimator of the ELBO for variational inference, based on the score function method with leave-one-out control variates. We show that this gradient estimator can be obtained using a new loss, defined as the variance of the log-ratio between the exact posterior and the variational approximation, which we call the $\textit{log-variance loss}$. Under certain conditions, the gradient of the log-variance loss equals the gradient of the (negative) ELBO. We show theoretically that this gradient estimator, which we call $\textit{VarGrad}$ due to its connection to the log-variance loss, exhibits lower variance than the score function method in certain settings, and that the leave-one-out control variate coefficients are close to the optimal ones. We empirically demonstrate that VarGrad offers a favourable variance versus computation trade-off compared to other state-of-the-art estimators on a discrete VAE.
Solving high-dimensional Hamilton-Jacobi-Bellman PDEs using neural networks: perspectives from the theory of controlled diffusions and measures on path space
Nüsken, Nikolas, Richter, Lorenz
Hamilton-Jacobi-Bellman partial differential equations (HJB-PDEs) are of central importance in applied mathematics. Rooted in reformulations of classical mechanics [45] in the nineteenth century, they nowadays form the backbone of (stochastic) optimal control theory [81, 115], having a profound impact on neighbouring fields such as optimal transportation [109, 110], mean field games [20], backward stochastic differential equations (BSDEs) [19] and large deviations [39]. Applications in science and engineering abound; examples include stochastic filtering and data assimilation [79, 95], the simulation of rare events in molecular dynamics [51, 54, 119], and nonconvex optimisation [24]. Many of these applications involve HJB-PDEs in high-dimensional or even infinite-dimensional state spaces, posing a formidable challenge for their numerical treatment and in particular rendering grid-based schemes infeasible. In recent years, approaches to approximating the solutions of high-dimensional elliptic and parabolic PDEs have been developed combining well-known Feynman-Kac formulae with machine learning methodologies, seeking scalability and robustness in high-dimensional and complex scenarios [50, 111]. Crucially, the use of artificial neural networks offers the promise of accurate and efficient function approximation which in conjunction with Monte Carlo methods can beat the curse of dimensionality, as investigated in [5, 25, 49, 60].