AITopics | non-smooth problem

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Neural Information Processing SystemsMar-17-2026, 10:35:09 GMT

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon^{1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.

artificial intelligence, optimization problem, proceedings, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.81)

Add feedback

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Igor Colin, Ludovic DOS SANTOS, Kevin Scaman

Neural Information Processing SystemsFeb-15-2026, 01:39:27 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, convergence rate, objective function, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Neural Information Processing SystemsNov-21-2025, 15:17:26 GMT

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon^{1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.

homotopy smoothing, lower complexity, non-smooth problem, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.81)

Add feedback

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Igor Colin, Ludovic DOS SANTOS, Kevin Scaman

Neural Information Processing SystemsAug-20-2025, 09:03:09 GMT

Distributing the training of deep neural networks can be tackled from several angles.

algorithm, convergence rate, objective function, (14 more...)

Neural Information Processing Systems

Country: North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Reviews: Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Neural Information Processing SystemsJan-20-2025, 18:26:00 GMT

The submission considers algorithms for solving a specific class of optimization problems, namely min_{x in Omega_1} F(x), where F(x) max_{u in Omega_2} \langle Ax, u \rangle - phi(u) g(x). Here, g is convex, Omega_1 is closed and convex, Omega_2 is closed, convex, and bounded, and the set of optimal solutions Omega_* \subset Omega_1 is convex, compact, and non-empty. The submission also assumes a proximal mapping for g can be computed efficiently. The above framework is apparently general enough to capture a number of applications, including various natural regularized empirical loss minimization problems that arise in machine learning. Classic work of Nesterov combined a smooth approximation technique with accelerated proximal gradient descent to converge to a solution with epsilon of optimal in O(1/epsilon) iterations.

approximation, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.64)
Information Technology > Artificial Intelligence > Machine Learning (0.61)

Add feedback

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Xu, Yi, Yan, Yan, Lin, Qihang, Yang, Tianbao

Neural Information Processing SystemsFeb-14-2020, 07:58:35 GMT

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon {1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function.

homotopy smoothing, iteration complexity, non-smooth problem, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)

Add feedback

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Colin, Igor, Santos, Ludovic Dos, Scaman, Kevin

arXiv.org Machine LearningOct-11-2019

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.

algorithm, convergence rate, objective function, (14 more...)

arXiv.org Machine Learning

1910.05104

Country: North America > Canada (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Inexact Proximal Gradient Methods for Non-convex and Non-smooth Optimization

Gu, Bin, Huo, Zhouyuan, Huang, Heng

arXiv.org Machine LearningDec-18-2016

Non-convex and non-smooth optimization plays an important role in machine learning. Proximal gradient method is one of the most important methods for solving the nonconvex and non-smooth problems, where a proximal operator need to be solved exactly for each step. However, in a lot of problems the proximal operator does not have an analytic solution, or is expensive to obtain an exact solution. In this paper, we propose inexact proximal gradient methods (not only a basic inexact proximal gradient method (IPG), but also a Nesterov's accelerated inexact proximal gradient method (AIPG)) for non-convex and non-smooth optimization, which tolerate an error in the calculation of the proximal operator. Theoretical analysis shows that IPG and AIPG have the same convergence rates as in the error-free case, provided that the errors decrease at appropriate rates. Keywords: Non-convex optimization, non-smooth optimization, proximal gradient, inexact proximal operator, Nesterov's accelerated method

artificial intelligence, machine learning, optimization, (13 more...)

arXiv.org Machine Learning

1612.06003

Country: North America > United States > Texas (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/\epsilon)$

Xu, Yi, Yan, Yan, Lin, Qihang, Yang, Tianbao

arXiv.org Machine LearningNov-3-2016

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.

artificial intelligence, machine learning, optimization problem, (17 more...)

arXiv.org Machine Learning

1607.03815

Country:

North America > United States > Iowa > Johnson County > Iowa City (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Filters

Collaborating Authors

non-smooth problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Reviews: Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Inexact Proximal Gradient Methods for Non-convex and Non-smooth Optimization

Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/\epsilon)$