AITopics

2412.16765

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

arXiv.org Artificial IntelligenceAug-18-2023

Variance reduction techniques for stochastic proximal point algorithms

Traoré, Cheik, Apidopoulos, Vassilis, Salzo, Saverio, Villa, Silvia

In the context of finite sums minimization, variance reduction techniques are widely used to improve the performance of state-of-the-art stochastic gradient methods. Their practical impact is clear, as well as their theoretical properties. Stochastic proximal point algorithms have been studied as an alternative to stochastic gradient algorithms since they are more stable with respect to the choice of the stepsize but a proper variance reduced version is missing. In this work, we propose the first study of variance reduction techniques for stochastic proximal point algorithms. We introduce a stochastic proximal version of SVRG, SAGA, and some of their variants for smooth and convex functions. We provide several convergence results for the iterates and the objective function values. In addition, under the Polyak-{\L}ojasiewicz (PL) condition, we obtain linear convergence rates for the iterates and the function values. Our numerical experiments demonstrate the advantages of the proximal variance reduction methods over their gradient counterparts, especially about the stability with respect to the choice of the step size.

algorithm, artificial intelligence, machine learning, (17 more...)

2308.0931

Country:

Europe (0.68)
North America > United States (0.14)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)

arXiv.org Artificial IntelligenceApr-17-2023

Snacks: a fast large-scale kernel SVM solver

Tanji, Sofiane, Della Vecchia, Andrea, Glineur, François, Villa, Silvia

Kernel methods provide a powerful framework for non parametric learning. They are based on kernel functions and allow learning in a rich functional space while applying linear statistical learning tools, such as Ridge Regression or Support Vector Machines. However, standard kernel methods suffer from a quadratic time and memory complexity in the number of data points and thus have limited applications in large-scale learning. In this paper, we propose Snacks, a new large-scale solver for Kernel Support Vector Machines. Specifically, Snacks relies on a Nystr\"om approximation of the kernel matrix and an accelerated variant of the stochastic subgradient method. We demonstrate formally through a detailed empirical evaluation, that it competes with other SVM solvers on a variety of benchmark datasets.

artificial intelligence, machine learning, solver, (18 more...)

2304.07983

Country: Europe (0.68)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)

arXiv.org Artificial IntelligenceDec-24-2022

Iterative regularization in classification via hinge loss diagonal descent

Apidopoulos, Vassilis, Poggio, Tomaso, Rosasco, Lorenzo, Villa, Silvia

Estimating a quantity of interest from finite measurements is a central problem in a number of fields including machine learning but also statistics and signal processing. In this context, a key idea is that reliable estimation requires imposing some prior assumptions on the problem at hand. The theory of inverse problems provides a principled framework to formalize this idea [27]. The quantity of interest is typically seen as a function, or a vector, and prior assumptions take the form of suitable functionals, called regularizers. Following this idea, Tikhonov regularization provides a classic approach to estimate solutions [83, 84]. Indeed, the latter are found by minimizing an empirical objective where a data fit term is penalized adding the chosen regularizer. Other regularization approaches are classic in inverse problems, and in particular iterative regularization has become popular in machine learning, see e.g.

artificial intelligence, machine learning, regularization, (16 more...)

2212.12675

Country: North America > United States (0.67)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.47)

arXiv.org Artificial IntelligenceDec-15-2022

Stochastic Zeroth order Descent with Structured Directions

Rando, Marco, Molinari, Cesare, Villa, Silvia, Rosasco, Lorenzo

We introduce and analyze Structured Stochastic Zeroth order Descent (S-SZD), a finite difference approach which approximates a stochastic gradient on a set of $l\leq d$ orthogonal directions, where $d$ is the dimension of the ambient space. These directions are randomly chosen, and may change at each step. For smooth convex functions we prove almost sure convergence of the iterates and a convergence rate on the function values of the form $O(d/l k^{-c})$ for every $c<1/2$, which is arbitrarily close to the one of Stochastic Gradient Descent (SGD) in terms of number of iterations. Our bound also shows the benefits of using $l$ multiple directions instead of one. For non-convex functions satisfying the Polyak-{\L}ojasiewicz condition, we establish the first convergence rates for stochastic zeroth order algorithms under such an assumption. We corroborate our theoretical findings in numerical simulations where assumptions are satisfied and on the real-world problem of hyper-parameter optimization, observing that S-SZD has very good practical performances.

artificial intelligence, machine learning, optimization, (16 more...)

2206.05124

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.89)

arXiv.org Machine LearningFeb-1-2022

Iterative regularization for low complexity regularizers

Molinari, Cesare, Massias, Mathurin, Rosasco, Lorenzo, Villa, Silvia

Iterative regularization exploits the implicit bias of an optimization algorithm to regularize ill-posed problems. Constructing algorithms with such built-in regularization mechanisms is a classic challenge in inverse problems but also in modern machine learning, where it provides both a new perspective on algorithms analysis, and significant speed-ups compared to explicit regularization. In this work, we propose and study the first iterative regularization procedure able to handle biases described by non smooth and non strongly convex functionals, prominent in low-complexity regularization. Our approach is based on a primal-dual algorithm of which we analyze convergence and stability properties, even in the case where the original problem is unfeasible. The general results are illustrated considering the special case of sparse recovery with the $\ell_1$ penalty. Our theoretical results are complemented by experiments showing the computational benefits of our approach.

artificial intelligence, machine learning, regularization, (17 more...)

2202.0042

Country:

North America > United States (0.14)
Asia (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)

arXiv.org Machine LearningOct-29-2020

Iterative regularization for convex regularizers

Molinari, Cesare, Massias, Mathurin, Rosasco, Lorenzo, Villa, Silvia

Machine learning often reduces to estimating some model parameters. This approach raises at least two orders of questions: first, multiple solutions may exist, amongst which a specific one must be selected; second, potential instabilities with respect to noise and sampling must be controlled. A classical way to achieve both goals is to consider explicitly penalized or constrained objective functions. In machine learning, this leads to regularized empirical risk minimization (Shalev-Shwartz and Ben-David, 2014). A more recent approach is based on directly exploiting an iterative optimization procedure for an unconstrained/unpenalized problem. This approach is shared by several related ideas. One is implicit regularization (Mahoney, 2012; Gunasekar et al., 2017), stemming from the observation that the bias is controlled increasing the number of iterations, just like in penalized methods it is controlled decreasing the penalty parameter. Another one is early stopping (Yao et al., 2007; Raskutti et al., 2014), putting emphasis on the fact that running the iterates to convergence might lead to instabilities in the presence of noise. Yet another, and more classical, idea is iterative regularization, where both aspects (convergence and stability) are considered to be relevant (Engl et al., 1996; Kaltenbacher et al., 2008).

algorithm, artificial intelligence, optimization problem, (16 more...)

2006.09859

Country: North America (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)

arXiv.org Machine LearningAug-1-2017

Convergence of the Forward-Backward Algorithm: Beyond the Worst Case with the Help of Geometry

Garrigos, Guillaume, Rosasco, Lorenzo, Villa, Silvia

We provide a comprehensive study of the convergence of forward-backward algorithm under suitable geometric conditions leading to fast rates. We present several new results and collect in a unified view a variety of results scattered in the literature, often providing simplified proofs. Novel contributions include the analysis of infinite dimensional convex minimization problems, allowing the case where minimizers might not exist. Further, we analyze the relation between different geometric conditions, and discuss novel connections with a priori conditions in linear inverse problems, including source conditions, restricted isometry properties and partial smoothness.

algorithm, artificial intelligence, machine learning, (19 more...)

1703.09477

Country:

Europe (1.00)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Neural Information Processing SystemsDec-31-2015

Learning with Incremental Iterative Regularization

Rosasco, Lorenzo, Villa, Silvia

Within a statistical learning setting, we propose and study an iterative regularization algorithm for least squares defined by an incremental gradient method. In particular, we show that, if all other parameters are fixed a priori, the number of passes over the data (epochs) acts as a regularization parameter, and prove strong universal consistency, i.e. almost sure convergence of the risk, as well as sharp finite sample bounds for the iterates. Our results are a step towards understanding the effect of multiple epochs in stochastic gradient techniques in machine learning and rely on integrating statistical and optimizationresults.

algorithm, artificial intelligence, iteration, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Genre: Research Report > New Finding (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)