AITopics | Umut Simsekli

Generalized Sliced Wasserstein Distances

Soheil Kolouri, Kimia Nadjahi, Umut Simsekli, Roland Badeau, Gustavo Rohde

Neural Information Processing SystemsMar-27-2025, 03:46:02 GMT

The Wasserstein distance and its variations, e.g., the sliced-Wasserstein (SW) distance, have recently drawn attention from the machine learning community. The SW distance, specifically, was shown to have similar properties to the Wasserstein distance, while being much simpler to compute, and is therefore used in various applications including generative modeling and general supervised/unsupervised learning. In this paper, we first clarify the mathematical connection between the SW distance and the Radon transform. We then utilize the generalized Radon transform to define a new family of distances for probability measures, which we call generalized sliced-Wasserstein (GSW) distances. We further show that, similar to the SW distance, the GSW distance can be extended to a maximum GSW (max-GSW) distance. We then provide the conditions under which GSW and max-GSW distances are indeed proper metrics. Finally, we compare the numerical performance of the proposed distances on the generative modeling task of SW flows and report favorable results.

artificial intelligence, machine learning, projection, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Virginia (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Kimia Nadjahi, Alain Durmus, Umut Simsekli, Roland Badeau

Neural Information Processing SystemsMar-26-2025, 21:44:01 GMT

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g.

artificial intelligence, estimator, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Thanh Huy Nguyen, Umut Simsekli, Mert Gurbuzbalaban, Gaël RICHARD

Neural Information Processing SystemsMar-26-2025, 12:49:37 GMT

Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using α-stable distributions, a family of heavytailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a Lévy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of'preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit. Intuitively, the behaviors of these two systems are expected to be similar to each other only when the discretization step is sufficiently small; however, to the best of our knowledge, there is no theoretical understanding on how small the step-size should be chosen in order to guarantee that the discretized system inherits the properties of the continuous-time system. In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit. We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters. We illustrate our results with simulations on a synthetic model and neural networks.

artificial intelligence, exit time, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

Tolga Birdal, Umut Simsekli, Mustafa Onur Eken, Slobodan Ilic

Neural Information Processing SystemsMar-26-2025, 02:01:53 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America (0.46)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Add feedback

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Kimia Nadjahi, Alain Durmus, Umut Simsekli, Roland Badeau

Neural Information Processing SystemsJan-27-2025, 01:17:46 GMT

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g.

artificial intelligence, estimator, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Thanh Huy Nguyen, Umut Simsekli, Mert Gurbuzbalaban, Gaël RICHARD

Neural Information Processing SystemsJan-26-2025, 08:24:00 GMT

Stochastic gradient descent (SGD) has been widely used in machine learning due to its computational efficiency and favorable generalization properties. Recently, it has been empirically demonstrated that the gradient noise in several deep learning settings admits a non-Gaussian, heavy-tailed behavior. This suggests that the gradient noise can be modeled by using α-stable distributions, a family of heavytailed distributions that appear in the generalized central limit theorem. In this context, SGD can be viewed as a discretization of a stochastic differential equation (SDE) driven by a Lévy motion, and the metastability results for this SDE can then be used for illuminating the behavior of SGD, especially in terms of'preferring wide minima'. While this approach brings a new perspective for analyzing SGD, it is limited in the sense that, due to the time discretization, SGD might admit a significantly different behavior than its continuous-time limit. Intuitively, the behaviors of these two systems are expected to be similar to each other only when the discretization step is sufficiently small; however, to the best of our knowledge, there is no theoretical understanding on how small the step-size should be chosen in order to guarantee that the discretized system inherits the properties of the continuous-time system. In this study, we provide formal theoretical analysis where we derive explicit conditions for the step-size such that the metastability behavior of the discrete-time system is similar to its continuous-time limit. We show that the behaviors of the two systems are indeed similar for small step-sizes and we identify how the error depends on the algorithm and problem parameters. We illustrate our results with simulations on a synthetic model and neural networks.

artificial intelligence, exit time, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America (0.46)
Europe (0.46)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo

Alain Durmus, Umut Simsekli, Eric Moulines, Roland Badeau, Gaël RICHARD

Neural Information Processing SystemsOct-8-2024, 14:29:06 GMT

Stochastic Gradient Markov Chain Monte Carlo (SG-MCMC) algorithms have become increasingly popular for Bayesian inference in large-scale applications. Even though these methods have proved useful in several scenarios, their performance is often limited by their bias. In this study, we propose a novel sampling algorithm that aims to reduce the bias of SG-MCMC while keeping the variance at a reasonable level. Our approach is based on a numerical sequence acceleration method, namely the Richardson-Romberg extrapolation, which simply boils down to running almost the same SG-MCMC algorithm twice in parallel with different step sizes. We illustrate our framework on the popular Stochastic Gradient Langevin Dynamics (SGLD) algorithm and propose a novel SG-MCMC algorithm referred to as Stochastic Gradient Richardson-Romberg Langevin Dynamics (SGRRLD). We provide formal theoretical analysis and show that SGRRLD is asymptotically consistent, satisfies a central limit theorem, and its non-asymptotic bias and the mean squared-error can be bounded. Our results show that SGRRLD attains higher rates of convergence than SGLD in both finite-time and asymptotically, and it achieves the theoretical accuracy of the methods that are based on higher-order integrators. We support our findings using both synthetic and real data experiments.

artificial intelligence, machine learning, sgrrld, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)

Add feedback

Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding

Mainak Jas, Tom Dupré la Tour, Umut Simsekli, Alexandre Gramfort

Neural Information Processing SystemsOct-8-2024, 00:21:36 GMT

Neural time-series data contain a wide variety of prototypical signal waveforms (atoms) that are of significant importance in clinical and cognitive research. One of the goals for analyzing such data is hence to extract such'shift-invariant' atoms. Even though some success has been reported with existing algorithms, they are limited in applicability due to their heuristic nature. Moreover, they are often vulnerable to artifacts and impulsive noise, which are typically present in raw neural recordings. In this study, we address these issues and propose a novel probabilistic convolutional sparse coding (CSC) model for learning shift-invariant atoms from raw neural signals containing potentially severe artifacts.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > New Finding (0.48)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.50)
Health & Medicine > Health Care Technology (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)

Add feedback

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

Tolga Birdal, Umut Simsekli, Mustafa Onur Eken, Slobodan Ilic

Neural Information Processing SystemsOct-7-2024, 10:47:45 GMT

We introduce Tempered Geodesic Markov Chain Monte Carlo (TG-MCMC) algorithm for initializing pose graph optimization problems, arising in various scenarios such as SFM (structure from motion) or SLAM (simultaneous localization and mapping). TG-MCMC is first of its kind as it unites global non-convex optimization on the spherical manifold of quaternions with posterior sampling, in order to provide both reliable initial poses and uncertainty estimates that are informative about the quality of solutions. We devise theoretical convergence guarantees and extensively evaluate our method on synthetic and real benchmarks. Besides its elegance in formulation and theory, we show that our method is robust to missing data, noise and the estimated uncertainties capture intuitive properties of the data.

algorithm, artificial intelligence, machine learning, (13 more...)

Neural Information Processing Systems

Country:

Europe (0.46)
North America (0.46)

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo

Alain Durmus, Umut Simsekli, Eric Moulines, Roland Badeau, Gaël RICHARD

Neural Information Processing SystemsOct-6-2024, 11:11:04 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, sgrrld, (20 more...)

Neural Information Processing Systems

Country: Europe (0.46)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)

Add feedback

Filters

Collaborating Authors

Umut Simsekli

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Generalized Sliced Wasserstein Distances

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

First Exit Time Analysis of Stochastic Gradient Descent Under Heavy-Tailed Gradient Noise

Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo

Learning the Morphology of Brain Signals Using Alpha-Stable Convolutional Sparse Coding

Bayesian Pose Graph Optimization via Bingham Distributions and Tempered Geodesic MCMC

Stochastic Gradient Richardson-Romberg Markov Chain Monte Carlo