AITopics | ojasiewicz condition

Collaborating Authors

ojasiewicz condition

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

An Alternating Optimization Method for Bilevel Problems under the Polyak-Łojasiewicz Condition

Neural Information Processing SystemsDec-26-2025, 18:35:44 GMT

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric, for a nonconvex lower-level objective that satisfies the Polyak-Łojasiewicz (PL) condition. We then propose a Generalized ALternating mEthod for bilevel opTimization (GALET) tailored to BLO with convex PL LL problem and establish that GALET achieves an $\epsilon$-stationary point for the considered problem within $\tilde{\cal O}(\epsilon^{-1})$ iterations, which matches the iteration complexity of GD for single-level smooth nonconvex problems.

alternating optimization method, bilevel problem, name change, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Faster Stochastic Algorithms for Minimax Optimization under Polyak-{\L}ojasiewicz Condition

Neural Information Processing SystemsDec-24-2025, 06:32:01 GMT

This paper considers stochastic first-order algorithms for minimax optimization under Polyak-{\L}ojasiewicz (PL) conditions.

faster stochastic algorithm, minimax optimization, name change, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.42)

Add feedback

Learning Positive Functions with Pseudo Mirror Descent Yingxiang Y ang

Neural Information Processing SystemsOct-2-2025, 14:32:46 GMT

Hawkes processes, in terms of both computational efficiency and accuracy.

artificial intelligence, machine learning, optimization problem, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.04)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback

An Alternating Optimization Method for Bilevel Problems under the Polyak-Łojasiewicz Condition

Neural Information Processing SystemsJan-19-2025, 21:59:56 GMT

Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can match the convergence rate of single-level gradient descent (GD) when addressing bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric, for a nonconvex lower-level objective that satisfies the Polyak-Łojasiewicz (PL) condition. We then propose a Generalized ALternating mEthod for bilevel opTimization (GALET) tailored to BLO with convex PL LL problem and establish that GALET achieves an \epsilon -stationary point for the considered problem within \tilde{\cal O}(\epsilon {-1}) iterations, which matches the iteration complexity of GD for single-level smooth nonconvex problems.

alternating optimization method, bilevel problem, ojasiewicz condition, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.40)

Add feedback

Stochastic Gradient Descent Revisited

Louzi, Azar

arXiv.org Machine LearningDec-8-2024

The advent of artificial intelligence (AI) has been rendered possible by the spectacular acceleration of computing chip capacity over the last few decades, and has driven a technological revolution that has not spared any aspect of life, including healthcare, supply chain management, social media, etc. AI describes a set of machine learning methods that abandon any form of structural representation of data and look instead into uncovering data patterns to produce probabilistic relationships between input and output quantities of interest. While it has significantly improved people's standards of living, AI has nevertheless engendered many operational risks (e.g. by producing undesirable or unexpected outcomes) as well as systemic risks (e.g. the "Flash Crash", whereby a blue-chip company's share price suddenly plummeted and bounced back in the span of minutes [KL13]). To better manage, prevent and mitigate such risks, some level of mathematical insight must be brought in to shed light onto the inner workings of AI, in order to allow practitioners and regulators alike to act upon it in order to increase its efficiency and curb its shortcomings. SGD is the engine of AI, making it a natural stepping stone toward mathematically explaining AI. Indeed, to capture their intricacies, machine learning problems are often modeled using wide and highly parametrized neural networks [GBC16], which are then solved using SGD or an adaptive variant thereof, namely Adagrad, Adadelta, RMSProp, Adamax or Adam [Rud17]. To approximate a stationary point of a given loss landscape (also referred to as objective or cost function [LZB22; AL24; AMA05]), SGD recursively spawns a trajectory of iterates by factoring in, at each step, a stochastic gradient modulated by a positive learning rate. Whereas classical SGD literature provides convergence guarantees and convergence rates within a (strongly) convex framework [Duf96; BV04; RM51], machine learning models are often highly nonconvex and require new SGD frameworks to better understand and parametrize them.

artificial intelligence, convergence, machine learning, (17 more...)

arXiv.org Machine Learning

2412.0607

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.04)

Genre: Research Report (0.50)

Industry: Banking & Finance > Trading (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Faster Stochastic Algorithms for Minimax Optimization under Polyak-{\L}ojasiewicz Condition

Neural Information Processing SystemsOct-11-2024, 04:39:02 GMT

This paper considers stochastic first-order algorithms for minimax optimization under Polyak-{\L}ojasiewicz (PL) conditions. We prove SPIDER-GDA could find an \epsilon -approximate solution within {\mathcal O}\left((n \sqrt{n}\,\kappa_x\kappa_y 2)\log (1/\epsilon)\right) stochastic first-order oracle (SFO) complexity, which is better than the state-of-the-art method whose SFO upper bound is {\mathcal O}\big((n n {2/3}\kappa_x\kappa_y 2)\log (1/\epsilon)\big), where \kappa_x\triangleq L/\mu_x and \kappa_y\triangleq L/\mu_y .For the ill-conditioned case, we provide an accelerated algorithm to reduce the computational cost further. Our ideas also can be applied to the more general setting that the objective function only satisfies PL condition for one variable. Numerical experiments validate the superiority of proposed methods.

faster stochastic algorithm, minimax optimization, ojasiewicz condition, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.64)

Add feedback

Online Non-Stationary Stochastic Quasar-Convex Optimization

Pun, Yuen-Man, Shames, Iman

arXiv.org Artificial IntelligenceJul-3-2024

Recent research has shown that quasar-convexity can be found in applications such as identification of linear dynamical systems and generalized linear models. Such observations have in turn spurred exciting developments in design and analysis algorithms that exploit quasar-convexity. In this work, we study the online stochastic quasar-convex optimization problems in a dynamic environment. We establish regret bounds of online gradient descent in terms of cumulative path variation and cumulative gradient variance for losses satisfying quasar-convexity and strong quasar-convexity. We then apply the results to generalized linear models (GLM) when the underlying parameter is time-varying. We establish regret bounds of online gradient descent when applying to GLMs with leaky ReLU activation function, logistic activation function, and ReLU activation function. Numerical results are presented to corroborate our findings.

activation function, gradient descent, online gradient descent, (15 more...)

arXiv.org Artificial Intelligence

2407.03601

Country: Europe > Italy (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.59)

Add feedback

Noisy Linear Convergence of Stochastic Gradient Descent for CV@R Statistical Learning under Polyak-{\L}ojasiewicz Conditions

Kalogerias, Dionysios S.

arXiv.org Machine LearningDec-14-2020

Conditional Value-at-Risk ($\mathrm{CV@R}$) is one of the most popular measures of risk, which has been recently considered as a performance criterion in supervised statistical learning, as it is related to desirable operational features in modern applications, such as safety, fairness, distributional robustness, and prediction error stability. However, due to its variational definition, $\mathrm{CV@R}$ is commonly believed to result in difficult optimization problems, even for smooth and strongly convex loss functions. We disprove this statement by establishing noisy (i.e., fixed-accuracy) linear convergence of stochastic gradient descent for sequential $\mathrm{CV@R}$ learning, for a large class of not necessarily strongly-convex (or even convex) loss functions satisfying a set-restricted Polyak-Lojasiewicz inequality. This class contains all smooth and strongly convex losses, confirming that classical problems, such as linear least squares regression, can be solved efficiently under the $\mathrm{CV@R}$ criterion, just as their risk-neutral versions. Our results are illustrated numerically on such a risk-aware ridge regression task, also verifying their validity in practice.

convex, inequality, learning, (10 more...)

arXiv.org Machine Learning

2012.07785

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Michigan (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback