AITopics

2503.16398

Country: Europe > France (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

arXiv.org Artificial IntelligenceOct-28-2024

$\texttt{skwdro}$: a library for Wasserstein distributionally robust machine learning

Vincent, Florian, Azizian, Waïss, Iutzeler, Franck, Malick, Jérôme

The library is based on distributionally robust optimization using optimal transport distances. For ease of use, it features both scikit-learn compatible estimators for popular objectives, as well as a wrapper for PyTorch modules, enabling researchers and practitioners to use it in a wide range of models with minimal code changes. Its implementation relies on an entropic smoothing of the original robust objective in order to ensure maximal model flexibility. The library is available at https://github.com/iutzeler/skwdro. Keywords: Distributionally robust optim., distribution shifts, entropic regularization

artificial intelligence, machine learning, skwdro, (15 more...)

2410.21231

Country: Europe > France (0.32)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.39)

arXiv.org Machine LearningJun-13-2024

What is the long-run distribution of stochastic gradient descent? A large deviations analysis

Azizian, Waïss, Iutzeler, Franck, Malick, Jérôme, Mertikopoulos, Panayotis

In this paper, we examine the long-run distribution of stochastic gradient descent (SGD) in general, non-convex problems. Specifically, we seek to understand which regions of the problem's state space are more likely to be visited by SGD, and by how much. Using an approach based on the theory of large deviations and randomly perturbed dynamical systems, we show that the long-run distribution of SGD resembles the Boltzmann-Gibbs distribution of equilibrium thermodynamics with temperature equal to the method's step-size and energy levels determined by the problem's objective and the statistics of the noise. In particular, we show that, in the long run, (a) the problem's critical region is visited exponentially more often than any non-critical region; (b) the iterates of SGD are exponentially concentrated around the problem's minimum energy state (which does not always coincide with the global minimum of the objective); (c) all other connected components of critical points are visited with frequency that is exponentially proportional to their energy level; and, finally (d) any component of local maximizers or saddle points is "dominated" by a component of local minimizers which is visited exponentially more often.

artificial intelligence, machine learning, sgd, (18 more...)

2406.09241

Country: Europe > France (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceMay-27-2024

Almost sure convergence rates of stochastic gradient methods under gradient domination

Weissmann, Simon, Klein, Sara, Azizian, Waïss, Döring, Leif

First-order methods to minimize an objective function f have played a central role in the success of machine learning. This is accompanied by a growing interest in convergence statements particularly for stochastic gradient methods in different settings. To ensure convergence to the global optimum some kind of convexity assumption on the objective function is required. Especially in machine learning problems the standard (strong) convexity assumption is nearly never fulfilled. However, it is well known that achieving convergence towards global optima is still possible under a weaker assumption, namely under the gradient domination property, often referred to as Polyak-Lojasiewicz (PL)-inequality [45]. Also in reinforcement learning, multiple results have shown that the objective function for policy gradient methods, under specific parametrizations, fulfills a weak type of gradient domination and therefore provably achieve convergence towards the global optimum [11, 21, 33, 34]. Improving the understanding of rates and optimal step size choices for stochastic first order methods is of significant interest for the machine learning and reinforcement learning community.

artificial intelligence, gradient domination property, machine learning, (16 more...)

2405.13592

Country: Europe > France (0.14)

Genre: Research Report > New Finding (0.68)

Industry: Education (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.72)

arXiv.org Artificial IntelligenceDec-15-2023

Automatic Rao-Blackwellization for Sequential Monte Carlo with Belief Propagation

Azizian, Waïss, Baudart, Guillaume, Lelarge, Marc

Exact Bayesian inference on state-space models~(SSM) is in general untractable, and unfortunately, basic Sequential Monte Carlo~(SMC) methods do not yield correct approximations for complex models. In this paper, we propose a mixed inference algorithm that computes closed-form solutions using belief propagation as much as possible, and falls back to sampling-based SMC methods when exact computations fail. This algorithm thus implements automatic Rao-Blackwellization and is even exact for Gaussian tree models.

algorithm, artificial intelligence, bayesian inference, (14 more...)

2312.0986

Country: Europe > France (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)

arXiv.org Machine LearningNov-6-2023

Exact Generalization Guarantees for (Regularized) Wasserstein Distributionally Robust Models

Azizian, Waïss, Iutzeler, Franck, Malick, Jérôme

Wasserstein distributionally robust estimators have emerged as powerful models for prediction and decision-making under uncertainty. These estimators provide attractive generalization guarantees: the robust objective obtained from the training distribution is an exact upper bound on the true risk with high probability. However, existing guarantees either suffer from the curse of dimensionality, are restricted to specific settings, or lead to spurious error terms. In this paper, we show that these generalization guarantees actually hold on general classes of models, do not suffer from the curse of dimensionality, and can even cover distribution shifts at testing. We also prove that these results carry over to the newly-introduced regularized versions of Wasserstein distributionally robust problems.

artificial intelligence, machine learning, probability, (15 more...)

2305.17076

Country:

Europe > France (0.14)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Artificial IntelligenceAug-2-2023

The rate of convergence of Bregman proximal methods: Local geometry vs. regularity vs. sharpness

Azizian, Waïss, Iutzeler, Franck, Malick, Jérôme, Mertikopoulos, Panayotis

We examine the last-iterate convergence rate of Bregman proximal methods - from mirror descent to mirror-prox and its optimistic variants - as a function of the local geometry induced by the prox-mapping defining the method. For generality, we focus on local solutions of constrained, non-monotone variational inequalities, and we show that the convergence rate of a given method depends sharply on its associated Legendre exponent, a notion that measures the growth rate of the underlying Bregman function (Euclidean, entropic, or other) near a solution. In particular, we show that boundary solutions exhibit a stark separation of regimes between methods with a zero and non-zero Legendre exponent: the former converge at a linear rate, while the latter converge, in general, sublinearly. This dichotomy becomes even more pronounced in linearly constrained problems where methods with entropic regularization achieve a linear convergence rate along sharp directions, compared to convergence in a finite number of steps under Euclidean regularization.

artificial intelligence, machine learning, null, (13 more...)

2211.08043

Country:

North America > United States (0.46)
Europe > France (0.28)

Genre: Research Report (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.92)

arXiv.org Machine LearningJun-28-2020

Characterizing the Expressive Power of Invariant and Equivariant Graph Neural Networks

Azizian, Waïss, Lelarge, Marc

Various classes of Graph Neural Networks (GNN) have been proposed and shown to be successful in a wide range of applications with graph structured data. In this paper, we propose a theoretical framework able to compare the expressive power of these GNN architectures. The current universality theorems only apply to intractable classes of GNNs. Here, we prove the first approximation guarantees for practical GNNs, paving the way for a better understanding of their generalization. Our theoretical results are proved for invariant GNNs computing a graph embedding (permutation of the nodes of the input graph does not affect the output) and equivariant GNNs computing an embedding of the nodes (permutation of the input permutes the output). We show that Folklore Graph Neural Networks (FGNN), which are tensor based GNNs augmented with matrix multiplication are the most expressive architectures proposed so far for a given tensor order. We illustrate our results on the Quadratic Assignment Problem (a NP-Hard combinatorial problem) by showing that FGNNs are able to learn how to solve the problem, leading to much better average performances than existing algorithms (based on spectral, SDP or other GNNs architectures). On a practical side, we also implement masked tensors to handle batches of graphs of varying sizes.

deep learning, graph, neural network, (20 more...)

2006.15646

Country: North America > United States (0.28)

Genre:

Research Report (0.84)
Instructional Material > Course Syllabus & Notes (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningJun-17-2019

Lower Bounds and Conditioning of Differentiable Games

Ibrahim, Adam, Azizian, Waïss, Gidel, Gauthier, Mitliagkas, Ioannis

Many recent machine learning tools rely on differentiable game formulations. While several numerical methods have been proposed for these types of games, most of the work has been on convergence proofs or on upper bounds for the rate of convergence of those methods. In this work, we approach the question of fundamental iteration complexity by providing lower bounds. We generalise Nesterov's argument -- used in single-objective optimisation to derive a lower bound for a class of first-order black box optimisation algorithms -- to games. Moreover, we extend to games the p-SCLI framework used to derive spectral lower bounds for a large class of derivative-based single-objective optimisers. Finally, we propose a definition of the condition number arising from our lower bound analysis that matches the conditioning observed in upper bounds. Our condition number is more expressive than previously used definitions, as it covers a wide range of games, including bilinear games that lack strong convex-concavity.

artificial intelligence, condition number, machine learning, (17 more...)

1906.073

Country: North America > Canada (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Machine LearningJun-13-2019

A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games

Azizian, Waïss, Mitliagkas, Ioannis, Lacoste-Julien, Simon, Gidel, Gauthier

We consider differentiable games: multi-objective minimization problems, where the goal is to find a Nash equilibrium. The machine learning community has recently started using extrapolation-based variants of the gradient method. A prime example is the extragradient, which yields linear convergence in cases like bilinear games, where the standard gradient method fails. The full benefits of extrapolation-based methods are not known: i) there is no unified analysis for a large class of games that includes both strongly monotone and bilinear games; ii) it is not known whether the rate achieved by extragradient can be improved, e.g. by considering multiple extrapolation steps. We answer these questions through new analysis of the extragradient's local and global convergence properties. Our analysis covers the whole range of settings between purely bilinear and strongly monotone games. It reveals that extragradient converges via different mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then present lower bounds on the rate of convergence for a wide class of algorithms with any number of extrapolations. Our bounds prove that the extragradient achieves the optimal rate in this class, and that our upper bounds are tight. Our precise characterization of the extragradient's convergence behavior in games shows that, unlike in convex optimization, the extragradient method may be much faster than the gradient method.

artificial intelligence, eigenvalue, optimization problem, (15 more...)

1906.05945

Country:

North America > Canada (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)