AITopics | Malitsky, Yura

Plotting

Malitsky, Yura

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Weaker Variance Assumptions for Stochastic Optimization

Alacaoglu, Ahmet, Malitsky, Yura, Wright, Stephen J.

arXiv.org Machine LearningApr-14-2025

We revisit a classical assumption for analyzing stochastic gradient algorithms where the squared norm of the stochastic subgradient (or the variance for smooth problems) is allowed to grow as fast as the squared norm of the optimization variable. We contextualize this assumption in view of its inception in the 1960s, its seemingly independent appearance in the recent literature, its relationship to weakest-known variance assumptions for analyzing stochastic gradient algorithms, and its relevance in deterministic problems for non-Lipschitz nonsmooth convex optimization. We build on and extend a connection recently made between this assumption and the Halpern iteration. For convex nonsmooth, and potentially stochastic, optimization, we analyze horizon-free, anytime algorithms with last-iterate rates. For problems beyond simple constrained optimization, such as convex problems with functional constraints or regularized convex-concave min-max problems, we obtain rates for optimality measures that do not require boundedness of the feasible set.

artificial intelligence, assumption, machine learning, (15 more...)

arXiv.org Machine Learning

2504.09951

Country: North America (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Add feedback

Adaptive Proximal Gradient Method for Convex Optimization

Malitsky, Yura, Mishchenko, Konstantin

arXiv.org Artificial IntelligenceAug-4-2023

In this paper, we explore two fundamental first-order algorithms in convex optimization, namely, gradient descent (GD) and proximal gradient method (ProxGD). Our focus is on making these algorithms entirely adaptive by leveraging local curvature information of smooth functions. We propose adaptive versions of GD and ProxGD that are based on observed gradient differences and, thus, have no added computational costs. Moreover, we prove convergence of our methods assuming only local Lipschitzness of the gradient. In addition, the proposed versions allow for even larger stepsizes than those initially suggested in [MM20].

artificial intelligence, machine learning, stepsize, (16 more...)

arXiv.org Artificial Intelligence

2308.02261

Country: Europe > Austria (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Stochastic Variance Reduction for Variational Inequality Methods

Alacaoglu, Ahmet, Malitsky, Yura

arXiv.org Machine LearningFeb-16-2021

We propose stochastic variance reduced algorithms for solving convex-concave saddle point problems, monotone variational inequalities, and monotone inclusions. Our framework applies to extragradient, forward-backward-forward, and forward-reflected-backward methods both in Euclidean and Bregman setups. All proposed methods converge in exactly the same setting as their deterministic counterparts and they either match or improve the best-known complexities for solving structured min-max problems. Our results reinforce the correspondence between variance reduction in variational inequalities and minimization. We also illustrate the improvements of our approach with numerical evaluations on matrix games.

artificial intelligence, complexity, machine learning, (15 more...)

arXiv.org Machine Learning

2102.08352

Country: Europe (0.45)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Convergence of adaptive algorithms for weakly convex constrained optimization

Alacaoglu, Ahmet, Malitsky, Yura, Cevher, Volkan

arXiv.org Machine LearningJun-11-2020

We analyze the adaptive first order algorithm AMSGrad, for solving a constrained stochastic optimization problem with a weakly convex objective. We prove the $\mathcal{\tilde O}(t^{-1/4})$ rate of convergence for the norm of the gradient of Moreau envelope, which is the standard stationarity measure for this class of problems. It matches the known rates that adaptive algorithms enjoy for the specific case of unconstrained smooth stochastic optimization. Our analysis works with mini-batch size of $1$, constant first and second order moment parameters, and possibly unbounded optimization domains. Finally, we illustrate the applications and extensions of our results to specific problems and algorithms.

artificial intelligence, null 2, optimization problem, (16 more...)

arXiv.org Machine Learning

2006.0665

Genre: Research Report > New Finding (0.34)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Adaptive gradient descent without descent

Malitsky, Yura, Mishchenko, Konstantin

arXiv.org Machine LearningOct-21-2019

Yura Malitsky Konstantin Mishchenko † Abstract We present a strikingly simple proof that two rules are sufficient to automate gradient descent: 1) don't increase the stepsize too fast and 2) don't overstep the local curvature. By following these rules, you get a method adaptive to the local geometry, with convergence guarantees depending only on smoothness in a neighborhood of a solution. Given that the problem is convex, our method will converge even if the global smoothness constant is infinity. As an illustration, it can minimize arbitrary continuously twice-differentiable convex function. We examine its performance on a range of convex and nonconvex problems, including matrix factorization and training of ResNet-18. 1 Introduction Since the early days of optimization it was evident that there is a need for algorithms that are as independent from the user as possible. First-order methods have proven to be versatile and efficient in a wide range of applications, but one drawback has been present all that time: the stepsize. Despite some certain success stories, line search procedures and adaptive online methods have not removed the need to manually tune the optimization parameters. Even in smooth convex optimization, which is often believed to be much simpler than the nonconvex counterpart, robust rules for stepsize selection have been elusive. The purpose of this work is to remedy this deficiency. The problem formulation that we consider is the basic unconstrained optimization problem min x R d f (x), (1) where f: R d R is a differentiable function. Throughout the paper we assume that (1) has a solution and we denote its optimal value by f . The simplest and most known approach to this problem is the gradient descent method (GD), whose origin can be traced back to Cauchy [7,20]. Although it is probably the oldest optimization method, it continues to play a central role in modern algorithmic theory and applications.

artificial intelligence, null 2, optimization problem, (17 more...)

arXiv.org Machine Learning

1910.09529

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)

Add feedback