AITopics

2411.06278

Country:

North America > United States (0.14)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.65)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceSep-24-2024

Score-based Neural Ordinary Differential Equations for Computing Mean Field Control Problems

Zhou, Mo, Osher, Stanley, Li, Wuchen

Classical neural ordinary differential equations (ODEs) are powerful tools for approximating the log-density functions in high-dimensional spaces along trajectories, where neural networks parameterize the velocity fields. This paper proposes a system of neural differential equations representing first-and second-order score functions along trajectories based on deep neural networks. We reformulate the mean field control (MFC) problem with individual noises into an unconstrained optimization problem framed by the proposed neural ODE system. Additionally, we introduce a novel regularization term to enforce characteristics of viscous Hamilton-Jacobi-Bellman (HJB) equations to be satisfied based on the evolution of the second-order score function. Examples include regularized Wasserstein proximal operators (RWPOs), probability flow matching of Fokker-Planck (FP) equations, and linear quadratic (LQ) MFC problems, which demonstrate the effectiveness and accuracy of the proposed method. Score functions have been widely used in modern machine learning algorithms, particularly generative models through time-reversible diffusion (Song et al., 2021). The score function can be viewed as a deterministic representation of diffusion in stochastic trajectories (Carrillo et al., 2019). These properties have inspired algorithms for simulating stochastic trajectories or sampling problems that converge to target distributions (Wang et al., 2022; Lu et al., 2024). Typical applications include modeling the time evolution of probability densities for stochastic dynamics and solving control problems constrained by such dynamics. While score functions provide powerful tools for modeling stochastic trajectories, their computations are often inefficient, especially in high-dimensional spaces. Classical methods, such as kernel density estimation (KDE) (Chen, 2017), tend to perform poorly in such settings due to the curse of dimensionality (Terrell & Scott, 1992). Recently, neural ODEs have emerged as efficient ways of estimating densities. In particular, one uses neural networks to parameterize the velocity fields and then approximates the logarithm of density function along trajectories. The time discretizations of neural ODEs can be viewed as normalization flows in generative models.

artificial intelligence, deep learning, machine learning, (19 more...)

2409.16471

Country: North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceFeb-8-2024

Wasserstein proximal operators describe score-based generative models and resolve memorization

Zhang, Benjamin J., Liu, Siting, Li, Wuchen, Katsoulakis, Markos A., Osher, Stanley J.

We focus on the fundamental mathematical structure of score-based generative models (SGMs). We first formulate SGMs in terms of the Wasserstein proximal operator (WPO) and demonstrate that, via mean-field games (MFGs), the WPO formulation reveals mathematical structure that describes the inductive bias of diffusion and score-based models. In particular, MFGs yield optimality conditions in the form of a pair of coupled partial differential equations: a forward-controlled Fokker-Planck (FP) equation, and a backward Hamilton-Jacobi-Bellman (HJB) equation. Via a Cole-Hopf transformation and taking advantage of the fact that the cross-entropy can be related to a linear functional of the density, we show that the HJB equation is an uncontrolled FP equation. Second, with the mathematical structure at hand, we present an interpretable kernel-based model for the score function which dramatically improves the performance of SGMs in terms of training samples and training time. In addition, the WPO-informed kernel model is explicitly constructed to avoid the recently studied memorization effects of score-based generative models. The mathematical form of the new kernel-based models in combination with the use of the terminal condition of the MFG reveals new explanations for the manifold learning and generalization properties of SGMs, and provides a resolution to their memorization effects. Finally, our mathematically informed, interpretable kernel-based model suggests new scalable bespoke neural network architectures for high-dimensional applications.

artificial intelligence, formula, machine learning, (15 more...)

2402.06162

Country:

North America > United States > South Carolina (0.14)
North America > United States > Massachusetts (0.14)
North America > United States > California (0.14)

Genre: Research Report (0.50)

Industry: Energy > Oil & Gas (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning > Rote Learning (0.82)

arXiv.org Artificial IntelligenceFeb-1-2024

Fisher information dissipation for time inhomogeneous stochastic differential equations

Feng, Qi, Zuo, Xinzhe, Li, Wuchen

We provide a Lyapunov convergence analysis for time-inhomogeneous variable coefficient stochastic differential equations (SDEs). Three typical examples include overdamped, irreversible drift, and underdamped Langevin dynamics. We first formula the probability transition equation of Langevin dynamics as a modified gradient flow of the Kullback-Leibler divergence in the probability space with respect to time-dependent optimal transport metrics. This formulation contains both gradient and non-gradient directions depending on a class of time-dependent target distribution. We then select a time-dependent relative Fisher information functional as a Lyapunov functional. We develop a time-dependent Hessian matrix condition, which guarantees the convergence of the probability density function of the SDE. We verify the proposed conditions for several time-inhomogeneous Langevin dynamics. For the overdamped Langevin dynamics, we prove the $O(t^{-1/2})$ convergence in $L^1$ distance for the simulated annealing dynamics with a strongly convex potential function. For the irreversible drift Langevin dynamics, we prove an improved convergence towards the target distribution in an asymptotic regime. We also verify the convergence condition for the underdamped Langevin dynamics. Numerical examples demonstrate the convergence results for the time-dependent Langevin dynamics.

artificial intelligence, machine learning, optimization problem, (17 more...)

2402.01036

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.34)

arXiv.org Machine LearningOct-2-2023

Noise-Free Sampling Algorithms via Regularized Wasserstein Proximals

Tan, Hong Ye, Osher, Stanley, Li, Wuchen

We consider the problem of sampling from a distribution governed by a potential function. This work proposes an explicit score based MCMC method that is deterministic, resulting in a deterministic evolution for particles rather than a stochastic differential equation evolution. The score term is given in closed form by a regularized Wasserstein proximal, using a kernel convolution that is approximated by sampling. We demonstrate fast convergence on various problems and show improved dimensional dependence of mixing time bounds for the case of Gaussian distributions compared to the unadjusted Langevin algorithm (ULA) and the Metropolis-adjusted Langevin algorithm (MALA). We additionally derive closed form expressions for the distributions at each iterate for quadratic potential functions, characterizing the variance reduction. Empirical results demonstrate that the particles behave in an organized manner, lying on level set contours of the potential. Moreover, the posterior mean estimator of the proposed method is shown to be closer to the maximum a-posteriori estimator compared to ULA and MALA in the context of Bayesian logistic regression. Additional examples demonstrate competitive performance for Bayesian neural network training.

artificial intelligence, bayesian inference, machine learning, (17 more...)

2308.14945

Country:

Europe (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

arXiv.org Machine LearningSep-22-2023

Scaling Limits of the Wasserstein information matrix on Gaussian Mixture Models

Li, Wuchen, Zhao, Jiaxi

We consider the Wasserstein metric on the Gaussian mixture models (GMMs), which is defined as the pullback of the full Wasserstein metric on the space of smooth probability distributions with finite second moment. It derives a class of Wasserstein metrics on probability simplices over one-dimensional bounded homogeneous lattices via a scaling limit of the Wasserstein metric on GMMs. Specifically, for a sequence of GMMs whose variances tend to zero, we prove that the limit of the Wasserstein metric exists after certain renormalization. Generalizations of this metric in general GMMs are established, including inhomogeneous lattice models whose lattice gaps are not the same, extended GMMs whose mean parameters of Gaussian components can also change, and the second-order metric containing high-order information of the scaling limit. We further study the Wasserstein gradient flows on GMMs for three typical functionals: potential, internal, and interaction energies. Numerical examples demonstrate the effectiveness of the proposed GMM models for approximating Wasserstein gradient flows.

artificial intelligence, gradient flow, machine learning, (16 more...)

2309.12997

Country:

Asia (0.28)
North America > United States > South Carolina > Richland County > Columbia (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningFeb-14-2021

Projected Wasserstein gradient descent for high-dimensional Bayesian inference

Wang, Yifei, Chen, Peng, Li, Wuchen

We propose a projected Wasserstein gradient descent method (pWGD) for high-dimensional Bayesian inference problems. The underlying density function of a particle system of WGD is approximated by kernel density estimation (KDE), which faces the long-standing curse of dimensionality. We overcome this challenge by exploiting the intrinsic low-rank structure in the difference between the posterior and prior distributions. The parameters are projected into a low-dimensional subspace to alleviate the approximation error of KDE in high dimensions. We formulate a projected Wasserstein gradient flow and analyze its convergence property under mild assumptions. Several numerical experiments illustrate the accuracy, convergence, and complexity scalability of pWGD with respect to parameter dimension, sample size, and processor cores.

artificial intelligence, bayesian inference, health & medicine, (16 more...)

2102.0635

Country: North America > United States > Texas (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

arXiv.org Artificial IntelligenceFeb-13-2021

Wasserstein Proximal of GANs

Lin, Alex Tong, Li, Wuchen, Osher, Stanley, Montufar, Guido

We introduce a new method for training generative adversarial networks by applying the Wasserstein-2 metric proximal on the generators. The approach is based on Wasserstein information geometry. It defines a parametrization invariant natural gradient by pulling back optimal transport structures from probability space to parameter space. We obtain easy-to-implement iterative regularizers for the parameter updates of implicit deep generative models. Our experiments demonstrate that this method improves the speed and stability of training in terms of wall-clock time and Fr\'echet Inception Distance.

deep learning, neural network, wasserstein proximal, (19 more...)

2102.06862

Country:

North America > United States > South Carolina > Richland County > Columbia (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

arXiv.org Artificial IntelligenceJan-4-2021

Transport information Bregman divergences

Li, Wuchen

We study Bregman divergences in probability density space embedded with the $L^2$--Wasserstein metric. Several properties and dualities of transport Bregman divergences are provided. In particular, we derive the transport Kullback--Leibler (KL) divergence by a Bregman divergence of negative Boltzmann--Shannon entropy in $L^2$--Wasserstein space. We also derive analytical formulas and generalizations of transport KL divergence for one-dimensional probability densities and Gaussian families.

artificial intelligence, divergence, optimization problem, (16 more...)

2101.01162

Country:

North America > United States (0.14)
Asia (0.14)

Genre:

Research Report (0.50)
Overview (0.46)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

arXiv.org Machine LearningOct-25-2019

Kernelized Wasserstein Natural Gradient

Arbel, Michael, Gretton, Arthur, Li, Wuchen, Montufar, Guido

Many machine learning problems can be expressed as the optimization of some cost functional over a parametric family of probability distributions. It is often beneficial to solve such optimization problems using natural gradient methods. These methods are invariant to the parametrization of the family, and thus can yield more effective optimization. Unfortunately, computing the natural gradient is challenging as it requires inverting a high dimensional matrix at each iteration. We propose a general framework to approximate the natural gradient for the Wasserstein metric, by leveraging a dual formulation of the metric restricted to a Reproducing Kernel Hilbert Space. Our approach leads to an estimator for gradient direction that can trade-off accuracy and computational cost, with theoretical guarantees. We verify its accuracy on simple examples, and show the advantage of using such an estimator in classification tasks on Cifar10 and Cifar100 empirically.

neural network, null, optimization problem, (19 more...)

1910.09652

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)