Goto

Collaborating Authors

 malliavin derivative


Malliavin Calculus with Weak Derivatives for Counterfactual Stochastic Optimization

arXiv.org Artificial Intelligence

We study counterfactual stochastic optimization of conditional loss functionals under misspecified and noisy gradient information. The difficulty is that when the conditioning event has vanishing or zero probability, naive Monte Carlo estimators are prohibitively inefficient; kernel smoothing, though common, suffers from slow convergence. We propose a two-stage kernel-free methodology. First, we show using Malliavin calculus that the conditional loss functional of a diffusion process admits an exact representation as a Skorohod integral, yielding variance comparable to classical Monte-Carlo variance. Second, we establish that a weak derivative estimate of the conditional loss functional with respect to model parameters can be evaluated with constant variance, in contrast to the widely used score function method whose variance grows linearly in the sample path length. Together, these results yield an efficient framework for counterfactual conditional stochastic gradient algorithms in rare-event regimes.


A Malliavin calculus approach to score functions in diffusion generative models

arXiv.org Machine Learning

Score-based diffusion generative models have recently emerged as a powerful tool for modelling complex data distributions. These models aim at learning the score function, which defines a map from a known probability distribution to the target data distribution via deterministic or stochastic differential equations (SDEs). The score function is typically estimated from data using a variety of approximation techniques, such as denoising or sliced score matching, Hyvärien's method, or Schrödinger bridges. In this paper, we derive an exact, closed form, expression for the score function for a broad class of nonlinear diffusion generative models. Our approach combines modern stochastic analysis tools such as Malliavin derivatives and their adjoint operators (Skorokhod integrals or Malliavin Divergence) with a new Bismut-type formula. The resulting expression for the score function can be written entirely in terms of the first and second variation processes, with all Malliavin derivatives systematically eliminated, thereby enhancing its practical applicability. The theoretical framework presented in this work offers a principled foundation for advancing score estimation methods in generative modelling, enabling the design of new sampling algorithms for complex probability distributions. Our results can be extended to broader classes of stochastic differential equations, opening new directions for the development of score-based diffusion generative models.


Malliavin-Bismut Score-based Diffusion Models

arXiv.org Artificial Intelligence

We introduce a new framework that employs Malliavin calculus to derive explicit expressions for the score function -- i.e., the gradient of the log-density -- associated with solutions to stochastic differential equations (SDEs). Our approach integrates classical integration-by-parts techniques with modern tools, such as Bismut's formula and Malliavin calculus, to address linear and nonlinear SDEs. In doing so, we establish a rigorous connection between the Malliavin derivative, its adjoint (the Malliavin divergence or the Skorokhod integral), Bismut's formula, and diffusion generative models, thus providing a systematic method for computing $\nabla \log p_t(x)$. For the linear case, we present a detailed study proving that our formula is equivalent to the actual score function derived from the solution of the Fokker--Planck equation for linear SDEs. Additionally, we derive a closed-form expression for $\nabla \log p_t(x)$ for nonlinear SDEs with state-independent diffusion coefficients. These advancements provide fresh theoretical insights into the smoothness and structure of probability densities and practical implications for score-based generative modelling, including the design and analysis of new diffusion models. Moreover, our findings promote the adoption of the robust Malliavin calculus framework in machine learning research. These results directly apply to various pure and applied mathematics fields, such as generative modelling, the study of SDEs driven by fractional Brownian motion, and the Fokker--Planck equations associated with nonlinear SDEs.


A backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations

arXiv.org Artificial Intelligence

In this work, we propose a novel backward differential deep learning-based algorithm for solving high-dimensional nonlinear backward stochastic differential equations (BSDEs), where the deep neural network (DNN) models are trained not only on the inputs and labels but also the differentials of the corresponding labels. This is motivated by the fact that differential deep learning can provide an efficient approximation of the labels and their derivatives with respect to inputs. The BSDEs are reformulated as differential deep learning problems by using Malliavin calculus. The Malliavin derivatives of solution to a BSDE satisfy themselves another BSDE, resulting thus in a system of BSDEs. Such formulation requires the estimation of the solution, its gradient, and the Hessian matrix, represented by the triple of processes $\left(Y, Z, \Gamma\right).$ All the integrals within this system are discretized by using the Euler-Maruyama method. Subsequently, DNNs are employed to approximate the triple of these unknown processes. The DNN parameters are backwardly optimized at each time step by minimizing a differential learning type loss function, which is defined as a weighted sum of the dynamics of the discretized BSDE system, with the first term providing the dynamics of the process $Y$ and the other the process $Z$. An error analysis is carried out to show the convergence of the proposed algorithm. Various numerical experiments up to $50$ dimensions are provided to demonstrate the high efficiency. Both theoretically and numerically, it is demonstrated that our proposed scheme is more efficient compared to other contemporary deep learning-based methodologies, especially in the computation of the process $\Gamma$.