AITopics | Fercoq, Olivier

Collaborating Authors

Fercoq, Olivier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Escaping limit cycles: Global convergence for constrained nonconvex-nonconcave minimax problems

Pethick, Thomas, Latafat, Puya, Patrinos, Panagiotis, Fercoq, Olivier, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-20-2023

This paper introduces a new extragradient-type algorithm for a class of nonconvex-nonconcave minimax problems. It is well-known that finding a local solution for general minimax problems is computationally intractable. This observation has recently motivated the study of structures sufficient for convergence of first order methods in the more general setting of variational inequalities when the so-called weak Minty variational inequality (MVI) holds. This problem class captures non-trivial structures as we demonstrate with examples, for which a large family of existing algorithms provably converge to limit cycles. Our results require a less restrictive parameter range in the weak MVI compared to what is previously known, thus extending the applicability of our scheme. The proposed algorithm is applicable to constrained and regularized problems, and involves an adaptive stepsize allowing for potentially larger stepsizes. Our scheme also converges globally even in settings where the underlying operator exhibits limit cycles.

artificial intelligence, converge, limit cycle, (15 more...)

arXiv.org Artificial Intelligence

2302.09831

Country: Europe (0.28)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.92)

Add feedback

Solving stochastic weak Minty variational inequalities without increasing batch size

Pethick, Thomas, Fercoq, Olivier, Latafat, Puya, Patrinos, Panagiotis, Cevher, Volkan

arXiv.org Artificial IntelligenceFeb-17-2023

This paper introduces a family of stochastic extragradient-type algorithms for a class of nonconvex-nonconcave problems characterized by the weak Minty vari-ational inequality (MVI). Unlike existing results on extragradient methods in the monotone setting, employing diminishing stepsizes is no longer possible in the weak MVI setting. This has led to approaches such as increasing batch sizes per iteration which can however be prohibitively expensive. In contrast, our proposed methods involves two stepsizes and only requires one additional oracle evaluation per iteration. We show that it is possible to keep one fixed stepsize while it is only the second stepsize that is taken to be diminishing, making it interesting even in the monotone setting. Almost sure convergence is established and we provide a unified analysis for this family of schemes which contains a nonlinear generalization of the celebrated primal dual hybrid gradient algorithm. Stochastic first-order methods have been at the core of the current success in deep learning applications. These methods are mostly well-understood for minimization problems at this point. This is even the case in the nonconvex setting where there exists matching upper and lower bounds on the complexity for finding an approximately stable point (Arjevani et al., 2019). The picture becomes less clear when moving beyond minimization into nonconvex-nonconcave min-imax problems--or more generally nonmonotone variational inequalities. Even in the deterministic case, finding a stationary point is in general intractable (Daskalakis et al., 2021; Hirsch & V avasis, 1987). This is in stark contrast with minimization where only global optimality is NP-hard. An interesting nonmonotone class for which we do have e fficient algorithms is characterized by the so called weak Minty variational inequality (MVI) (Diakonikolas et al., 2021). This problem class captures nontrivial structures such as attracting limit cycles and is governed by a parameter ρ whose negativity increases the degree of nonmonotonicity. In other words, it seems that we need to take γ large to guarantee convergence for a large class. This reliance on a large stepsize is at the core of why the community has struggled to provide a stochastic variants for weak MVIs. The only known results e ff ectively increase the batch size at every iteration (Diakonikolas et al., 2021, Thm. Pethick et al. (2022) proposed (SEG +) which attempts to tackle the noise by only diminishing the second stepsize.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.09029

Country: Europe (0.28)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Screening Rules and its Complexity for Active Set Identification

Ndiaye, Eugene, Fercoq, Olivier, Salmon, Joseph

arXiv.org Machine LearningSep-6-2020

In learning problems involving a large number of variables, sparse models such as Lasso and Support Vector Machines (SVM) allow to select the most important variables. For instance, the Lasso estimator depends only on a subset of features that have a maximal absolute correlation with the residual; whereas the SVM classifier depends only on a subset of sample (the support vectors) that characterize the margin. The remaining features/variables have no contribution to the optimal solution. Thus, early detection of those non influential variables may lead to significant simplifications of the problem, memory and computational resources saving. Some noticeable examples are the facial reduction preprocessing steps used for accelerating the linear programming solvers [4, 22] and conic programming [3], we refer to [6, 25] for recent reviews.

artificial intelligence, optimization problem, screening rule, (19 more...)

arXiv.org Machine Learning

2009.02709

Country: Europe > France (0.14)

Genre: Research Report (0.64)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

Add feedback

Random extrapolation for primal-dual coordinate descent

Alacaoglu, Ahmet, Fercoq, Olivier, Cevher, Volkan

arXiv.org Machine LearningJul-13-2020

We introduce a randomly extrapolated primal-dual coordinate descent method that adapts to sparsity of the data matrix and the favorable structures of the objective function. Our method updates only a subset of primal and dual variables with sparse data, and it uses large step sizes with dense data, retaining the benefits of the specific methods designed for each case. In addition to adapting to sparsity, our method attains fast convergence guarantees in favorable cases \textit{without any modifications}. In particular, we prove linear convergence under metric subregularity, which applies to strongly convex-strongly concave problems and piecewise linear quadratic functions. We show almost sure convergence of the sequence and optimal sublinear convergence rates for the primal-dual gap and objective values, in the general convex-concave case. Numerical evidence demonstrates the state-of-the-art empirical performance of our method in sparse and dense settings, matching and improving the existing methods.

artificial intelligence, optimization problem, random extrapolation, (15 more...)

arXiv.org Machine Learning

2007.06528

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.45)

Add feedback

Joint quantile regression in vector-valued RKHSs

Sangnier, Maxime, Fercoq, Olivier, d', Alché-Buc, Florence

Neural Information Processing SystemsFeb-14-2020, 14:28:55 GMT

Addressing the will to give a more complete picture than an average relationship provided by standard regression, a novel framework for estimating and predicting simultaneously several conditional quantiles is introduced. The proposed methodology leverages kernel-based multi-task learning to curb the embarrassing phenomenon of quantile crossing, with a one-step estimation procedure and no post-processing. Moreover, this framework comes along with theoretical guarantees and an efficient coordinate descent learning algorithm. Numerical experiments on benchmark and real datasets highlight the enhancements of our approach regarding the prediction error, the crossing occurrences and the training time. Papers published at the Neural Information Processing Systems Conference.

artificial intelligence, joint quantile regression, machine learning, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

GAP Safe screening rules for sparse multi-task and multi-class models

Ndiaye, Eugene, Fercoq, Olivier, Gramfort, Alexandre, Salmon, Joseph

Neural Information Processing SystemsFeb-14-2020, 07:27:51 GMT

Screening rules leverage the known sparsity of the solution by ignoring some variables in the optimization, hence speeding up solvers. When the procedure is proven not to discard features wrongly the rules are said to be safe. In this paper we derive new safe rules for generalized linear models regularized with L1 and L1/L2 norms. The rules are based on duality gap computations and spherical safe regions whose diameters converge to zero. This allows to discard safely more variables, in particular for low regularization parameters.

artificial intelligence, gap safe screening rule, machine learning, (3 more...)

Neural Information Processing Systems

Genre: Research Report (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.73)

Add feedback

Stochastic Conditional Gradient Method for Composite Convex Minimization

Locatello, Francesco, Yurtsever, Alp, Fercoq, Olivier, Cevher, Volkan

arXiv.org Machine LearningJan-29-2019

In this paper, we propose the first practical algorithm to minimize stochastic composite optimization problems over compact convex sets. This template allows for affine constraints and therefore covers stochastic semidefinite programs (SDPs), which are vastly applicable in both machine learning and statistics. In this setup, stochastic algorithms with convergence guarantees are either not known or not tractable. We tackle this general problem and propose a convergent, easy to implement and tractable algorithm. We prove $\mathcal{O}(k^{-1/3})$ convergence rate in expectation on the objective residual and $\mathcal{O}(k^{-5/12})$ in expectation on the feasibility gap. These rates are achieved without increasing the batchsize, which can contain a single sample. We present extensive empirical evidence demonstrating the superiority of our algorithm on a broad range of applications including optimization of stochastic SDPs.

algorithm 1, artificial intelligence, optimization problem, (16 more...)

arXiv.org Machine Learning

1901.10348

Country:

North America > United States (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Safe Grid Search with Optimal Complexity

Ndiaye, Eugene, Le, Tam, Fercoq, Olivier, Salmon, Joseph, Takeuchi, Ichiro

arXiv.org Machine LearningOct-12-2018

Popular machine learning estimators involve regularization parameters that can be challenging to tune, and standard strategies rely on grid search for this task. In this paper, we revisit the techniques of approximating the regularization path up to predefined tolerance $\epsilon$ in a unified framework and show that its complexity is $O(1/\sqrt[d]{\epsilon})$ for uniformly convex loss of order $d>0$ and $O(1/\sqrt{\epsilon})$ for Generalized Self-Concordant functions. This framework encompasses least-squares but also logistic regression (a case that as far as we know was not handled as precisely by previous works). We leverage our technique to provide refined bounds on the validation error as well as a practical algorithm for hyperparameter tuning. The later has global convergence guarantee when targeting a prescribed accuracy on the validation set. Last but not least, our approach helps relieving the practitioner from the (often neglected) task of selecting a stopping criterion when optimizing over the training set: our method automatically calibrates it based on the targeted accuracy on the validation set.

artificial intelligence, complexity, machine learning, (17 more...)

arXiv.org Machine Learning

1810.05471

Country:

North America > United States (0.46)
Europe > France > Occitanie (0.14)

Genre: Research Report (0.91)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Smooth Primal-Dual Coordinate Descent Algorithms for Nonsmooth Convex Optimization

Alacaoglu, Ahmet, Dinh, Quoc Tran, Fercoq, Olivier, Cevher, Volkan

Neural Information Processing SystemsDec-31-2017

We propose a new randomized coordinate descent method for a convex optimization template with broad applications. Our analysis relies on a novel combination of four ideas applied to the primal-dual gap function: smoothing, acceleration, homotopy, and coordinate descent with non-uniform sampling. As a result, our method features the first convergence rate guarantees among the coordinate descent methods, that are the best-known under a variety of common structure assumptions on the template. We provide numerical evidence to support the theoretical results with a comparison to state-of-the-art algorithms.

algorithm, neurology, optimization problem, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.68)
North America > United States (0.28)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Add feedback

Gap Safe screening rules for sparsity enforcing penalties

Ndiaye, Eugene, Fercoq, Olivier, Gramfort, Alexandre, Salmon, Joseph

arXiv.org Machine LearningDec-27-2017

In high dimensional regression settings, sparsity enforcing penalties have proved useful to regularize the data-fitting term. A recently introduced technique called screening rules propose to ignore some variables in the optimization leveraging the expected sparsity of the solutions and consequently leading to faster solvers. When the procedure is guaranteed not to discard variables wrongly the rules are said to be safe. In this work, we propose a unifying framework for generalized linear models regularized with standard sparsity enforcing penalties such as $\ell_1$ or $\ell_1/\ell_2$ norms. Our technique allows to discard safely more variables than previously considered safe rules, particularly for low regularization parameters. Our proposed Gap Safe rules (so called because they rely on duality gap computation) can cope with any iterative solver but are particularly well suited to (block) coordinate descent methods. Applied to many standard learning tasks, Lasso, Sparse-Group Lasso, multi-task Lasso, binary and multinomial logistic regression, etc., we report significant speed-ups compared to previously proposed safe rules on all tested data sets.

health & medicine, neurology, screening rule, (15 more...)

arXiv.org Machine Learning

1611.0578

Country:

Europe > France (0.14)
Europe > Sweden (0.14)
North America > United States (0.14)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)

Add feedback