AITopics | Gidel, Gauthier

Collaborating Authors

Gidel, Gauthier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

A Tight and Unified Analysis of Extragradient for a Whole Spectrum of Differentiable Games

Azizian, Waïss, Mitliagkas, Ioannis, Lacoste-Julien, Simon, Gidel, Gauthier

arXiv.org Machine LearningJun-13-2019

We consider differentiable games: multi-objective minimization problems, where the goal is to find a Nash equilibrium. The machine learning community has recently started using extrapolation-based variants of the gradient method. A prime example is the extragradient, which yields linear convergence in cases like bilinear games, where the standard gradient method fails. The full benefits of extrapolation-based methods are not known: i) there is no unified analysis for a large class of games that includes both strongly monotone and bilinear games; ii) it is not known whether the rate achieved by extragradient can be improved, e.g. by considering multiple extrapolation steps. We answer these questions through new analysis of the extragradient's local and global convergence properties. Our analysis covers the whole range of settings between purely bilinear and strongly monotone games. It reveals that extragradient converges via different mechanisms at these extremes; in between, it exploits the most favorable mechanism for the given problem. We then present lower bounds on the rate of convergence for a wide class of algorithms with any number of extrapolations. Our bounds prove that the extragradient achieves the optimal rate in this class, and that our upper bounds are tight. Our precise characterization of the extragradient's convergence behavior in games shows that, unlike in convex optimization, the extragradient method may be much faster than the gradient method.

artificial intelligence, eigenvalue, optimization problem, (15 more...)

arXiv.org Machine Learning

1906.05945

Country:

North America > Canada (0.14)
Europe > United Kingdom (0.14)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

A Closer Look at the Optimization Landscapes of Generative Adversarial Networks

Berard, Hugo, Gidel, Gauthier, Almahairi, Amjad, Vincent, Pascal, Lacoste-Julien, Simon

arXiv.org Machine LearningJun-11-2019

Generative adversarial networks have been very successful in generative modeling, however they remain relatively hard to optimize compared to standard deep neural networks. In this paper, we try to gain insight into the optimization of GANs by looking at the game vector field resulting from the concatenation of the gradient of both players. Based on this point of view, we propose visualization techniques that allow us to make the following empirical observations. First, the training of GANs suffers from rotational behavior around locally stable stationary points, which, as we show, corresponds to the presence of imaginary components in the eigenvalues of the Jacobian of the game. Secondly, GAN training seems to converge to a stable stationary point which is a saddle point for the generator loss, not a minimum, while still achieving excellent performance. This counter-intuitive yet persistent observation questions whether we actually need a Nash equilibrium to get good performance in GANs.

deep learning, eigenvalue, neural network, (21 more...)

arXiv.org Machine Learning

1906.04848

Country: North America > Canada (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics

Kerg, Giancarlo, Goyette, Kyle, Touzel, Maximilian Puelma, Gidel, Gauthier, Vorontsov, Eugene, Bengio, Yoshua, Lajoie, Guillaume

arXiv.org Artificial IntelligenceMay-28-2019

A recent strategy to circumvent the exploding and vanishing gradient problem in RNNs, and to allow the stable propagation of signals over long time scales, is to constrain recurrent connectivity matrices to be orthogonal or unitary. This ensures eigenvalues with unit norm and thus stable dynamics and training. However this comes at the cost of reduced expressivity due to the limited variety of orthogonal transformations. We propose a novel connectivity structure based on the Schur decomposition and a splitting of the Schur form into normal and non-normal parts. This allows to parametrize matrices with unit-norm eigenspectra without orthogonality constraints on eigenbases. The resulting architecture ensures access to a larger space of spectrally constrained matrices, of which orthogonal matrices are a subset. This crucial difference retains the stability advantages and training speed of orthogonal RNNs while enhancing expressivity, especially on tasks that require computations over ongoing input sequences.

deep learning, matrix, neural network, (18 more...)

arXiv.org Artificial Intelligence

1905.1208

Country: North America > United States > New York (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Painless Stochastic Gradient: Interpolation, Line-Search, and Convergence Rates

Vaswani, Sharan, Mishkin, Aaron, Laradji, Issam, Schmidt, Mark, Gidel, Gauthier, Lacoste-Julien, Simon

arXiv.org Machine LearningMay-23-2019

Recent works have shown that stochastic gradient descent (SGD) achieves the fast convergence rates of full-batch gradient descent for over-parameterized models satisfying certain interpolation conditions. However, the step-size used in these works depends on unknown quantities, and SGD's practical performance heavily relies on the choice of the step-size. We propose to use line-search methods to automatically set the step-size when training models that can interpolate the data. We prove that SGD with the classic Armijo line-search attains the fast convergence rates of full-batch gradient descent in convex and strongly-convex settings. We also show that under additional assumptions, SGD with a modified line-search can attain a fast rate of convergence for non-convex functions. Furthermore, we show that a stochastic extra-gradient method with a Lipschitz line-search attains a fast convergence rates for an important class of non-convex functions and saddle-point problems satisfying interpolation. We then give heuristics to use larger stepsizes and acceleration with our line-search techniques. We compare the proposed algorithms against numerous optimization methods for standard classification tasks using both kernel methods and deep networks. The proposed methods are robust and result in competitive performance across all models and datasets.

artificial intelligence, interpolation, machine learning, (15 more...)

arXiv.org Machine Learning

1905.09997

Country:

Europe (0.67)
North America > United States (0.28)
North America > Canada > Quebec (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Implicit Regularization of Discrete Gradient Dynamics in Deep Linear Neural Networks

Gidel, Gauthier, Bach, Francis, Lacoste-Julien, Simon

arXiv.org Machine LearningApr-30-2019

When optimizing over-parameterized models, such as deep neural networks, a large set of parameters leads to a zero training error. However they lead to different values for the test error and thus have distinct generalization properties. More specifically, Neyshabur [2017, Part II] argues that the choice of the optimization algorithm (and its respective hyperparameters) provides an implicit regularization with respect to its geometry: it biases the training, finding a particular minimizer of the objective. In this work, we use the same setting as Saxe et al. [2018]: a regression problem with least-square loss on a multidimensional output. Our prediction is made either by a linear model or by a two-layer linear neural network [Saxe et al., 2018]. Our goal is to extend their work on the continuous gradient dynamics in order to understand the behavior of the discrete dynamics induced by these two models. We show that with a vanishing initialization and a small enough step-size, the gradient dynamics of the two-layer linear neural network sequentially learns components that can be ranked according to a hierarchical structure whereas the gradient dynamics of the linear model learns the same components at the same time, missing this notion of hierarchy between components. The path followed by the two-layer formulation actually corresponds to successively solving the initial regression problem with a growing low rank constraint which is also know as reduced-rank regression [Izenman, 1975]. Note that this notion of path followed by the dynamics of a whole network is different from the notion of path introduced by Neyshabur et al. [2015a] which

deep learning, implicit regularization, neural network, (18 more...)

arXiv.org Machine Learning

1904.13262

Country:

North America > Canada (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Reducing Noise in GAN Training with Variance Reduced Extragradient

Chavdarova, Tatjana, Gidel, Gauthier, Fleuret, François, Lacoste-Julien, Simon

arXiv.org Machine LearningApr-18-2019

Using large mini-batches when training generative adversarial networks (GANs) has been recently shown to significantly improve the quality of the generated samples. This can be seen as a simple but computationally expensive way of reducing the noise of the gradient estimates. In this paper, we investigate the effect of the noise in this context and show that it can prevent the convergence of standard stochastic game optimization methods, while their respective batch version converges. To address this issue, we propose a variance-reduced version of the stochastic extragradient algorithm (SVRE). We show experimentally that it performs similarly to a batch method, while being computationally cheaper, and show its theoretical convergence, improving upon the best rates proposed in the literature. Experiments on several datasets show that SVRE improves over baselines. Notably, SVRE is the first optimization method for GANs to our knowledge that can produce near state-of-the-art results without using adaptive step-size such as Adam.

deep learning, neural network, variance, (19 more...)

arXiv.org Machine Learning

1904.08598

Country: North America > Canada (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Negative Momentum for Improved Game Dynamics

Gidel, Gauthier, Hemmat, Reyhane Askari, Pezeshki, Mohammad, Huang, Gabriel, Lepriol, Remi, Lacoste-Julien, Simon, Mitliagkas, Ioannis

arXiv.org Machine LearningJul-12-2018

Games generalize the optimization paradigm by introducing different objective functions for different optimizing agents, known as players. Generative Adversarial Networks (GANs) are arguably the most popular game formulation in recent machine learning literature. GANs achieve great results on generating realistic natural images, however they are known for being difficult to train. Training them involves finding a Nash equilibrium, typically performed using gradient descent on the two players' objectives. Game dynamics can induce rotations that slow down convergence to a Nash equilibrium, or prevent it altogether. We provide a theoretical analysis of the game dynamics. Our analysis, supported by experiments, shows that gradient descent with a negative momentum term can improve the convergence properties of some GANs.

eigenvalue, neural network, optimization problem, (17 more...)

arXiv.org Machine Learning

1807.0474

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.47)
Education > Curriculum (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.69)

Add feedback

Parametric Adversarial Divergences are Good Task Losses for Generative Modeling

Huang, Gabriel, Berard, Hugo, Touati, Ahmed, Gidel, Gauthier, Vincent, Pascal, Lacoste-Julien, Simon

arXiv.org Machine LearningJun-27-2018

Generative modeling of high dimensional data like images is a notoriously difficult and ill-defined problem. In particular, how to evaluate a learned generative model is unclear. In this position paper, we argue that adversarial learning, pioneered with generative adversarial networks (GANs), provides an interesting framework to implicitly define more meaningful task losses for generative modeling tasks, such as for generating "visually realistic" images. We refer to those task losses as parametric adversarial divergences and we give two main reasons why we think parametric divergences are good learning objectives for generative modeling. Additionally, we unify the processes of choosing a good structured loss (in structured prediction) and choosing a discriminator architecture (in generative modeling) using statistical decision theory; we are then able to formalize and quantify the intuition that "weaker" losses are easier to learn from, in a specific setting. Finally, we propose two new challenging tasks to evaluate parametric and nonparametric divergences: a qualitative task of generating very high-resolution digits, and a quantitative task of learning data that satisfies high-level algebraic constraints. We use two common divergences to train a generator and show that the parametric divergence outperforms the nonparametric divergence on both the qualitative and the quantitative task.

artificial intelligence, divergence, neural network, (15 more...)

arXiv.org Machine Learning

1708.02511

Country: North America > Canada (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Frank-Wolfe Splitting via Augmented Lagrangian Method

Gidel, Gauthier, Pedregosa, Fabian, Lacoste-Julien, Simon

arXiv.org Machine LearningApr-9-2018

Minimizing a function over an intersection of convex sets is an important task in optimization that is often much more challenging than minimizing it over each individual constraint set. While traditional methods such as Frank-Wolfe (FW) or proximal gradient descent assume access to a linear or quadratic oracle on the intersection, splitting techniques take advantage of the structure of each sets, and only require access to the oracle on the individual constraints. In this work, we develop and analyze the Frank-Wolfe Augmented Lagrangian (FW-AL) algorithm, a method for minimizing a smooth function over convex compact sets related by a "linear consistency" constraint that only requires access to a linear minimization oracle over the individual constraints. It is based on the Augmented Lagrangian Method (ALM), also known as Method of Multipliers, but unlike most existing splitting methods, it only requires access to linear (instead of quadratic) minimization oracles. We use recent advances in the analysis of Frank-Wolfe and the alternating direction method of multipliers algorithms to prove a sublinear convergence rate for FW-AL over general convex compact sets and a linear convergence rate for polytopes.

algorithm, health & medicine, optimization problem, (17 more...)

arXiv.org Machine Learning

1804.03176

Country:

Europe (0.45)
North America > Canada (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

A Variational Inequality Perspective on Generative Adversarial Nets

Gidel, Gauthier, Berard, Hugo, Vincent, Pascal, Lacoste-Julien, Simon

arXiv.org Machine LearningFeb-28-2018

Stability has been a recurrent issue in training generative adversarial networks (GANs). One common way to tackle this issue has been to propose new formulations of the GAN objective. Yet, surprisingly few studies have looked at optimization methods specifically designed for this adversarial training. In this work, we review the "variational inequality" framework which contains most formulations of the GAN objective introduced so far. Taping into the mathematical programming literature, we counter some common misconceptions about the difficulties of saddle point optimization and propose to extend standard methods designed for variational inequalities to GANs training, such as a stochastic version of the extragradient method, and empirically investigate their behavior on GANs.

optimization problem, survey article, variational inequality, (18 more...)

arXiv.org Machine Learning

1802.10551

Country: North America > Canada > Quebec > Montreal (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback