AITopics | nonsmooth nonconvex optimization

Collaborating Authors

nonsmooth nonconvex optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Adam Converges in Nonsmooth Nonconvex Optimization

Liu, Zijian

arXiv.org Machine LearningJun-23-2026

Adam is one of the most widely implemented and influential modern optimizers. Why is it effective across different optimization problems in practice? This question arguably lies at the center of the optimization community over the last decade and has motivated a substantial body of work aimed at understanding its convergence behavior. However, existing studies have mainly focused on the convergence rate of Adam in smooth nonconvex optimization, which unfortunately does not adequately capture practical settings, since many real-world problems are nonsmooth, such as those arising in training neural networks. Thus, these studies cannot fully explain the popularity and empirical success of Adam. Recently, an insightful and powerful framework called Online-to-Nonconvex Conversion has opened a new way to analyze Adam for nonsmooth nonconvex optimization. Unfortunately, prior works along this line share two common limitations. First, all of them ignore the important bias-correction term in the original Adam algorithm. Second and more importantly, many of them require extra operations that are not used in Adam, such as a clipping step. Therefore, the convergence guarantee for the original Adam method still remains unclear. In this work, we present the first finite-time analysis for the classical form of Adam, i.e., with the bias-correction step and without further algorithmic modifications, and prove that a randomly scaled learning rate ensures a convergence rate of $1/T^{\frac{2}{13}}$ for nonsmooth nonconvex optimization. Moreover, our result provably applies to the modern heavy-tailed noise regime, which is closer to practice. Interestingly, our theory is established under the parameter choice $β_1=β_2$, aligning with the recent empirical studies.

artificial intelligence, machine learning, optimization, (15 more...)

arXiv.org Machine Learning

2606.22326

Country:

North America > United States (0.68)
Asia (0.46)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)

Add feedback

Oracle Complexity in Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsApr-24-2026, 10:32:59 GMT

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find -stationary points (with gradient norm less than) in O(1/ 2) iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results (under mild assumptions): First, we consider the problem of getting near -stationary points. This is perhaps the most natural relaxation of finding -stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing methods, but only by making the oracle complexity of the smoothing process exponentially large.

algorithm, artificial intelligence, machine learning, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

A Further Related Work on Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsFeb-11-2026, 05:11:06 GMT

To appreciate the difficulty and the broad scope of the research agenda in nonsmooth nonconvex optimization, we start by describing the existing relevant literature. First, the existing work is mostly devoted to establishing the asymptotic convergence properties of various optimization algorithms, including gradient sampling (GS) methods [16-18, 57, 19], bundle methods [56, 40] and subgradient methods [8, 65, 30, 28, 12]. More specifically, Burke et al. [16] provided a systematic investigation of approximating the Clarke subdifferential through random sampling and proposed a gradient bundle method [17]--the precursor of GS methods--for optimizing a nonconvex, nonsmooth and non-Lipschitz function. Later, Burke et al. [18] and Kiwiel [57] proposed the GS methods by incorporating key modifications into the algorithmic scheme in Burke et al. [17] and proved that every cluster point of the iterates generated by GS methods is a Clarke stationary point. For an overview of GS methods, we refer to Burke et al. [19].

artificial intelligence, inequality, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Oracle Complexity in Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsDec-23-2025, 16:44:22 GMT

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find $\epsilon$-stationary points (with gradient norm less than $\epsilon$) in $\mathcal{O}(1/\epsilon^2)$ iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results (under mild assumptions): First, we consider the problem of getting \emph{near} $\epsilon$-stationary points. This is perhaps the most natural relaxation of \emph{finding} $\epsilon$-stationary points, which is impossible in the nonsmooth nonconvex case. We prove that this relaxed goal cannot be achieved efficiently, for any distance and $\epsilon$ smaller than some constants. Our second result deals with the possibility of tackling nonsmooth nonconvex optimization by reduction to smooth optimization: Namely, applying smooth optimization methods on a smooth approximation of the objective function. For this approach, we prove an inherent trade-off between oracle complexity and smoothness: On the one hand, smoothing a nonsmooth nonconvex function can be done very efficiently (e.g., by randomized smoothing), but with dimension-dependent factors in the smoothness parameter, which can strongly affect iteration complexity when plugging into standard smooth optimization methods. On the other hand, these dimension factors can be eliminated with suitable smoothing methods, but only by making the oracle complexity of the smoothing process exponentially large.

name change, nonsmooth nonconvex optimization, oracle complexity, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.81)
Information Technology > Artificial Intelligence > Machine Learning (0.75)

Add feedback

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsNov-20-2025, 23:11:10 GMT

We analyze stochastic gradient algorithms for optimizing nonconvex, nonsmooth finite-sum problems. In particular, the objective function is given by the summation of a differentiable (possibly nonconvex) component, together with a possibly non-differentiable but convex component. We propose a proximal stochastic gradient algorithm based on variance reduction, called ProxSVRG+. Our main contribution lies in the analysis of ProxSVRG+. It recovers several existing convergence results and improves/generalizes them (in terms of the number of stochastic gradient oracle calls and proximal oracle calls). In particular, ProxSVRG+ generalizes the best results given by the SCSG algorithm, recently proposed by [Lei et al., NIPS'17] for the smooth nonconvex case. ProxSVRG+ is also more straightforward than SCSG and yields simpler analysis. Moreover, ProxSVRG+ outperforms the deterministic proximal gradient descent (ProxGD) for a wide range of minibatch sizes, which partially solves an open problem proposed in [Reddi et al., NIPS'16].

name change, nonsmooth nonconvex optimization, simple proximal stochastic gradient method, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

A Further Related Work on Optimization

Neural Information Processing SystemsAug-17-2025, 11:04:14 GMT

Different from these gradient-based methods, we focus on the gradient-free methods in this paper. We are also aware of many recent works on the algorithmic design in the structured nonsmooth nonconvex optimization. Then, we proceed to prove the second statement. In this section, we present some technical lemmas for analyzing the convergence property of gradient-free method and its two-phase version. We also give the proofs of Theorem 3.2 and 3.4.

artificial intelligence, cd 3 2, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

a78f142aec481e68c75276756e0a0d91-Paper-Conference.pdf

Neural Information Processing SystemsAug-17-2025, 11:04:11 GMT

artificial intelligence, machine learning, optimization, (13 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications

Liu, Zijian

arXiv.org Machine LearningAug-12-2025

In Online Convex Optimization (OCO), when the stochastic gradient has a finite variance, many algorithms provably work and guarantee a sublinear regret. However, limited results are known if the gradient estimate has a heavy tail, i.e., the stochastic gradient only admits a finite $\mathsf{p}$-th central moment for some $\mathsf{p}\in\left(1,2\right]$. Motivated by it, this work examines different old algorithms for OCO (e.g., Online Gradient Descent) in the more challenging heavy-tailed setting. Under the standard bounded domain assumption, we establish new regrets for these classical methods without any algorithmic modification. Remarkably, these regret bounds are fully optimal in all parameters (can be achieved even without knowing $\mathsf{p}$), suggesting that OCO with heavy tails can be solved effectively without any extra operation (e.g., gradient clipping). Our new results have several applications. A particularly interesting one is the first provable convergence result for nonsmooth nonconvex optimization under heavy-tailed noise without gradient clipping. Furthermore, we explore broader settings (e.g., smooth OCO) and extend our ideas to optimistic algorithms to handle different cases simultaneously.

artificial intelligence, machine learning, optimization, (14 more...)

arXiv.org Machine Learning

2508.07473

Country:

North America > United States > New York (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Add feedback

Oracle Complexity in Nonsmooth Nonconvex Optimization

Neural Information Processing SystemsOct-9-2024, 09:23:44 GMT

It is well-known that given a smooth, bounded-from-below, and possibly nonconvex function, standard gradient-based methods can find \epsilon -stationary points (with gradient norm less than \epsilon) in \mathcal{O}(1/\epsilon 2) iterations. However, many important nonconvex optimization problems, such as those associated with training modern neural networks, are inherently not smooth, making these results inapplicable. In this paper, we study nonsmooth nonconvex optimization from an oracle complexity viewpoint, where the algorithm is assumed to be given access only to local information about the function at various points. We provide two main results (under mild assumptions): First, we consider the problem of getting \emph{near} \epsilon -stationary points. This is perhaps the most natural relaxation of \emph{finding} \epsilon -stationary points, which is impossible in the nonsmooth nonconvex case.

epsilon -stationary point, nonsmooth nonconvex optimization, oracle complexity, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.78)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.44)

Add feedback

Filters

Collaborating Authors

nonsmooth nonconvex optimization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Adam Converges in Nonsmooth Nonconvex Optimization

Oracle Complexity in Nonsmooth Nonconvex Optimization

A Further Related Work on Nonsmooth Nonconvex Optimization

030e65da2b1c944090548d36b244b28d-Paper.pdf

Oracle Complexity in Nonsmooth Nonconvex Optimization

A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization

A Further Related Work on Optimization

a78f142aec481e68c75276756e0a0d91-Paper-Conference.pdf

Online Convex Optimization with Heavy Tails: Old Algorithms, New Regrets, and Applications

Oracle Complexity in Nonsmooth Nonconvex Optimization