AITopics | Jelassi, Samy

Plotting

Jelassi, Samy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Depth separation beyond radial functions

Venturi, Luca, Jelassi, Samy, Ozuch, Tristan, Bruna, Joan

arXiv.org Machine LearningFeb-3-2021

High-dimensional depth separation results for neural networks show that certain functions can be efficiently approximated by two-hidden-layer networks but not by one-hidden-layer ones in high-dimensions $d$. Existing results of this type mainly focus on functions with an underlying radial or one-dimensional structure, which are usually not encountered in practice. The first contribution of this paper is to extend such results to a more general class of functions, namely functions with piece-wise oscillatory structure, by building on the proof strategy of (Eldan and Shamir, 2016). We complement these results by showing that, if the domain radius and the rate of oscillation of the objective function are constant, then approximation by one-hidden-layer networks holds at a $\mathrm{poly}(d)$ rate for any fixed error threshold. A common theme in the proof of such results is the fact that one-hidden-layer networks fail to approximate high-energy functions whose Fourier representation is spread in the domain. On the other hand, existing approximation results of a function by one-hidden-layer neural networks rely on the function having a sparse Fourier representation. The choice of the domain also represents a source of gaps between upper and lower approximation bounds. Focusing on a fixed approximation domain, namely the sphere $\mathbb{S}^{d-1}$ in dimension $d$, we provide a characterization of both functions which are efficiently approximable by one-hidden-layer networks and of functions which are provably not, in terms of their Fourier expansion.

approximation, deep learning, neural network, (16 more...)

arXiv.org Machine Learning

2102.01621

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Adaptivity without Compromise: A Momentumized, Adaptive, Dual Averaged Gradient Method for Stochastic Optimization

Defazio, Aaron, Jelassi, Samy

arXiv.org Artificial IntelligenceJan-26-2021

We introduce MADGRAD, a novel optimization method in the family of AdaGrad adaptive gradient methods. MADGRAD shows excellent performance on deep learning optimization problems from multiple fields, including classification and image-to-image tasks in vision, and recurrent and bidirectionally-masked models in natural language processing. For each of these tasks, MADGRAD matches or outperforms both SGD and ADAM in test set performance, even on problems for which adaptive methods normally perform poorly.

deep learning, madgrad, neural network, (18 more...)

arXiv.org Artificial Intelligence

2101.11075

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Dual Averaging is Surprisingly Effective for Deep Learning Optimization

Jelassi, Samy, Defazio, Aaron

arXiv.org Machine LearningOct-20-2020

Stochastic first-order optimization methods have been extensively employed for training neural networks. It has been empirically observed that the choice of the optimization algorithm is crucial for obtaining a good accuracy score. For instance, stochastic variance-reduced methods perform poorly in computer vision (CV) (Defazio & Bottou, 2019). On the other hand, SGD with momentum (SGD M) (Bottou, 1991; LeCun et al., 1998; Bottou & Bousquet, 2008) works particularly well on CV tasks and Adam (Kingma & Ba, 2014) outperforms other methods on natural language processing (NLP) tasks (Choi et al., 2019). In general, the choice of optimizer, as well as its hyper-parameters, must be included among the set of hyper-parameters that are searched over when tuning.

deep learning, neural network, optimization problem, (18 more...)

arXiv.org Machine Learning

2010.10502

Country: North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Extra-gradient with player sampling for provable fast convergence in n-player games

Enrich, Carles Domingo, Jelassi, Samy, Carles, Domingo, Scieur, Damien, Mensch, Arthur, Bruna, Joan

arXiv.org Machine LearningJun-4-2019

Data-driven model training is increasingly relying on finding Nash equilibria with provable techniques, e.g., for GANs and multi-agent RL. In this paper, we analyse a new extra-gradient method, that performs gradient extrapolations and updates on a random subset of players at each iteration. This approach provably exhibits the same rate of convergence as full extra-gradient in non-smooth convex games. We propose an additional variance reduction mechanism for this to hold for smooth convex games. Our approach makes extrapolation amenable to massive multiplayer settings, and brings empirical speed-ups, in particular when using cyclic sampling schemes. We demonstrate the efficiency of player sampling on large-scale non-smooth and non-strictly convex games. We show that the joint use of extrapolation and player sampling allows to train better GANs on CIFAR10.

equation, game theory, optimization problem, (19 more...)

arXiv.org Machine Learning

1905.12363

Genre:

Overview (1.00)
Research Report (0.64)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)

Add feedback

Global convergence of neuron birth-death dynamics

Rotskoff, Grant, Jelassi, Samy, Bruna, Joan, Vanden-Eijnden, Eric

arXiv.org Machine LearningFeb-5-2019

Neural networks with a large number of parameters admit a mean-field description, which has recently served as a theoretical explanation for the favorable training properties of "overparameterized" models. In this regime, gradient descent obeys a deterministic partial differential equation (PDE) that converges to a globally optimal solution for networks with a single hidden layer under appropriate assumptions. In this work, we propose a non-local mass transport dynamics that leads to a modified PDE with the same minimizer. We implement this non-local dynamics as a stochastic neuronal birth-death process and we prove that it accelerates the rate of convergence in the mean-field limit. We subsequently realize this PDE with two classes of numerical schemes that converge to the mean-field equation, each of which can easily be implemented for neural networks with finite numbers of parameters. We illustrate our algorithms with two models to provide intuition for the mechanism through which convergence is accelerated.

convergence, deep learning, neural network, (18 more...)

arXiv.org Machine Learning

1902.01843

Country: North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.36)

Add feedback

Smoothed analysis of the low-rank approach for smooth semidefinite programs

Pumir, Thomas, Jelassi, Samy, Boumal, Nicolas

Neural Information Processing SystemsDec-31-2018

We consider semidefinite programs (SDPs) of size $n$ with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix $Y$ of size $n\times k$ such that $X=YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced, and positive semidefiniteness is naturally enforced. However, optimization in $Y$ is non-convex. In prior work, it has been shown that, when the constraints on the factorized variable regularly define a smooth manifold, provided $k$ is large enough, for almost all cost matrices, all second-order stationary points (SOSPs) are optimal. Importantly, in practice, one can only compute points which approximately satisfy necessary optimality conditions, leading to the question: are such points also approximately optimal? To this end, and under similar assumptions, we use smoothed analysis to show that approximate SOSPs for a randomly perturbed objective function are approximate global optima, with $k$ scaling like the square root of the number of constraints (up to log factors). We particularize our results to an SDP relaxation of phase retrieval.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Netherlands (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Smoothed analysis of the low-rank approach for smooth semidefinite programs

Pumir, Thomas, Jelassi, Samy, Boumal, Nicolas

Neural Information Processing SystemsDec-31-2018

artificial intelligence, matrix, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.14)
Europe > Netherlands (0.14)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Smoothed analysis of the low-rank approach for smooth semidefinite programs

Pumir, Thomas, Jelassi, Samy, Boumal, Nicolas

arXiv.org Machine LearningJun-10-2018

We consider semidefinite programs (SDPs) of size n with equality constraints. In order to overcome scalability issues, Burer and Monteiro proposed a factorized approach based on optimizing over a matrix Y of size $n$ by $k$ such that $X = YY^*$ is the SDP variable. The advantages of such formulation are twofold: the dimension of the optimization variable is reduced and positive semidefiniteness is naturally enforced. However, the problem in Y is non-convex. In prior work, it has been shown that, when the constraints on the factorized variable regularly define a smooth manifold, provided k is large enough, for almost all cost matrices, all second-order stationary points (SOSPs) are optimal. Importantly, in practice, one can only compute points which approximately satisfy necessary optimality conditions, leading to the question: are such points also approximately optimal? To this end, and under similar assumptions, we use smoothed analysis to show that approximate SOSPs for a randomly perturbed objective function are approximate global optima, with k scaling like the square root of the number of constraints (up to log factors). We particularize our results to an SDP relaxation of phase retrieval.

artificial intelligence, matrix, optimization problem, (18 more...)

arXiv.org Machine Learning

1806.03763

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.66)

Add feedback