AITopics | jko step

Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated with each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications. Our approach relies on input-convex neural networks (ICNNs) to discretize the JKO steps, which can be optimized by stochastic gradient descent. Contrarily to previous work, our method does not require domain discretization or particle simulation. As a result, we can sample from the measure at each time step of the diffusion and compute its probability density. We demonstrate the performance of our algorithm by computing diffusions following the Fokker-Planck equation and apply it to unnormalized density sampling as well as nonlinear filtering.

electronic proceedings, large-scale wasserstein gradient flow, name change, (6 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.27)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.83)

Add feedback

A Experimental Details

Neural Information Processing SystemsAug-15-2025, 12:16:57 GMT

We set β to be equal to 1 throughout our experiments. The parameters are summarized in Table 3. The batch size is N = 512 .

dual jko null, experimental detail, jko step, (13 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.96)

Add feedback

810dfbbebb17302018ae903e9cb7a483-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 12:16:54 GMT

arxiv preprint arxiv, gradient flow, wasserstein gradient flow, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)

Add feedback

Large-Scale Wasserstein Gradient Flows

Neural Information Processing SystemsAug-12-2025, 13:39:02 GMT

Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated with each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications.

artificial intelligence, large-scale wasserstein gradient flow, machine learning, (5 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Distributionally Robust Optimization via Iterative Algorithms in Continuous Probability Spaces

Zhu, Linglingzhi, Xie, Yao

arXiv.org Machine LearningDec-29-2024

We consider a minimax problem motivated by distributionally robust optimization (DRO) when the worst-case distribution is continuous, leading to significant computational challenges due to the infinite-dimensional nature of the optimization problem. Recent research has explored learning the worst-case distribution using neural network-based generative models to address these computational challenges but lacks algorithmic convergence guarantees. This paper bridges this theoretical gap by presenting an iterative algorithm to solve such a minimax problem, achieving global convergence under mild assumptions and leveraging technical tools from vector space minimax optimization and convex analysis in the space of continuous probability densities. In particular, leveraging Brenier's theorem, we represent the worst-case distribution as a transport map applied to a continuous reference measure and reformulate the regularized discrepancy-based DRO as a minimax problem in the Wasserstein space. Furthermore, we demonstrate that the worst-case distribution can be efficiently computed using a modified Jordan-Kinderlehrer-Otto (JKO) scheme with sufficiently large regularization parameters for commonly used discrepancy functions, linked to the radius of the ambiguity set. Additionally, we derive the global convergence rate and quantify the total number of subgradient and inexact modified JKO iterations required to obtain approximate stationary points. These results are potentially applicable to nonconvex and nonsmooth scenarios, with broad relevance to modern machine learning applications.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2412.20556

Country: Asia > Middle East > Jordan (0.24)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)

Add feedback

Large-Scale Wasserstein Gradient Flows

Neural Information Processing SystemsOct-11-2024, 11:50:37 GMT

Wasserstein gradient flows provide a powerful means of understanding and solving many diffusion equations. Specifically, Fokker-Planck equations, which model the diffusion of probability measures, can be understood as gradient descent over entropy functionals in Wasserstein space. This equivalence, introduced by Jordan, Kinderlehrer and Otto, inspired the so-called JKO scheme to approximate these diffusion processes via an implicit discretization of the gradient flow in Wasserstein space. Solving the optimization problem associated with each JKO step, however, presents serious computational challenges. We introduce a scalable method to approximate Wasserstein gradient flows, targeted to machine learning applications.

fokker-planck equation, large-scale wasserstein gradient flow, wasserstein space, (3 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.48)

Add feedback

Importance Corrected Neural JKO Sampling

Hertrich, Johannes, Gruhlke, Robert

arXiv.org Machine LearningJul-29-2024

In order to sample from an unnormalized probability density function, we propose to combine continuous normalizing flows (CNFs) with rejection-resampling steps based on importance weights. We relate the iterative training of CNFs with regularized velocity fields to a JKO scheme and prove convergence of the involved velocity fields to the velocity field of the Wasserstein gradient flow (WGF). The alternation of local flow steps and non-local rejection-resampling steps allows to overcome local minima or slow convergence of the WGF for multimodal distributions. Since the proposal of the rejection step is generated by the model itself, they do not suffer from common drawbacks of classical rejection schemes. The arising model can be trained iteratively, reduces the reverse Kulback-Leibler (KL) loss function in each step, allows to generate iid samples and moreover allows for evaluations of the generated underlying density. Numerical examples show that our method yields accurate results on various test distributions including high-dimensional multimodal targets and outperforms the state of the art in almost all cases significantly.

international conference, rejection step, wasserstein gradient flow, (13 more...)

arXiv.org Machine Learning

2407.20444

Country:

Asia > Middle East > Jordan (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Europe > Switzerland > Basel-City > Basel (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Scalable Wasserstein Gradient Flow for Generative Modeling through Unbalanced Optimal Transport

Choi, Jaemoo, Choi, Jaewoong, Kang, Myungjoo

arXiv.org Artificial IntelligenceFeb-8-2024

Wasserstein Gradient Flow (WGF) describes the gradient dynamics of probability density within the Wasserstein space. WGF provides a promising approach for conducting optimization over the probability distributions. Numerically approximating the continuous WGF requires the time discretization method. The most well-known method for this is the JKO scheme. In this regard, previous WGF models employ the JKO scheme and parametrize transport map for each JKO step. However, this approach results in quadratic training complexity $O(K^2)$ with the number of JKO step $K$. This severely limits the scalability of WGF models. In this paper, we introduce a scalable WGF-based generative model, called Semi-dual JKO (S-JKO). Our model is based on the semi-dual form of the JKO step, derived from the equivalence between the JKO step and the Unbalanced Optimal Transport. Our approach reduces the training complexity to $O(K)$. We demonstrate that our model significantly outperforms existing WGF-based generative models, achieving FID scores of 2.62 on CIFAR-10 and 6.19 on CelebA-HQ-256, which are comparable to state-of-the-art image generative models.

jko step, scalable wasserstein gradient flow, wasserstein gradient flow, (12 more...)

arXiv.org Artificial Intelligence

2402.05443

Country: