Goto

Collaborating Authors

 non-smooth problem


Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Neural Information Processing Systems

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon^{1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.



Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Neural Information Processing Systems

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon^{1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.



Reviews: Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Neural Information Processing Systems

The submission considers algorithms for solving a specific class of optimization problems, namely min_{x in Omega_1} F(x), where F(x) max_{u in Omega_2} \langle Ax, u \rangle - phi(u) g(x). Here, g is convex, Omega_1 is closed and convex, Omega_2 is closed, convex, and bounded, and the set of optimal solutions Omega_* \subset Omega_1 is convex, compact, and non-empty. The submission also assumes a proximal mapping for g can be computed efficiently. The above framework is apparently general enough to capture a number of applications, including various natural regularized empirical loss minimization problems that arise in machine learning. Classic work of Nesterov combined a smooth approximation technique with accelerated proximal gradient descent to converge to a solution with epsilon of optimal in O(1/epsilon) iterations.


Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than O(1/\epsilon)

Xu, Yi, Yan, Yan, Lin, Qihang, Yang, Tianbao

Neural Information Processing Systems

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\tilde O(1/\epsilon {1-\theta})$ with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function.


Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning

Colin, Igor, Santos, Ludovic Dos, Scaman, Kevin

arXiv.org Machine Learning

We investigate the theoretical limits of pipeline parallel learning of deep learning architectures, a distributed setup in which the computation is distributed per layer instead of per example. For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal. For non-smooth convex functions, we provide a novel algorithm coined Pipeline Parallel Random Smoothing (PPRS) that is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension. While the convergence rate still obeys a slow $\varepsilon^{-2}$ convergence rate, the depth-dependent part is accelerated, resulting in a near-linear speed-up and convergence time that only slightly depends on the depth of the deep learning architecture. Finally, we perform an empirical analysis of the non-smooth non-convex case and show that, for difficult and highly non-smooth problems, PPRS outperforms more traditional optimization algorithms such as gradient descent and Nesterov's accelerated gradient descent for problems where the sample size is limited, such as few-shot or adversarial learning.


Inexact Proximal Gradient Methods for Non-convex and Non-smooth Optimization

Gu, Bin, Huo, Zhouyuan, Huang, Heng

arXiv.org Machine Learning

Non-convex and non-smooth optimization plays an important role in machine learning. Proximal gradient method is one of the most important methods for solving the nonconvex and non-smooth problems, where a proximal operator need to be solved exactly for each step. However, in a lot of problems the proximal operator does not have an analytic solution, or is expensive to obtain an exact solution. In this paper, we propose inexact proximal gradient methods (not only a basic inexact proximal gradient method (IPG), but also a Nesterov's accelerated inexact proximal gradient method (AIPG)) for non-convex and non-smooth optimization, which tolerate an error in the calculation of the proximal operator. Theoretical analysis shows that IPG and AIPG have the same convergence rates as in the error-free case, provided that the errors decrease at appropriate rates. Keywords: Non-convex optimization, non-smooth optimization, proximal gradient, inexact proximal operator, Nesterov's accelerated method


Homotopy Smoothing for Non-Smooth Problems with Lower Complexity than $O(1/\epsilon)$

Xu, Yi, Yan, Yan, Lin, Qihang, Yang, Tianbao

arXiv.org Machine Learning

In this paper, we develop a novel {\bf ho}moto{\bf p}y {\bf s}moothing (HOPS) algorithm for solving a family of non-smooth problems that is composed of a non-smooth term with an explicit max-structure and a smooth term or a simple non-smooth term whose proximal mapping is easy to compute. The best known iteration complexity for solving such non-smooth optimization problems is $O(1/\epsilon)$ without any assumption on the strong convexity. In this work, we will show that the proposed HOPS achieved a lower iteration complexity of $\widetilde O(1/\epsilon^{1-\theta})$\footnote{$\widetilde O()$ suppresses a logarithmic factor.} with $\theta\in(0,1]$ capturing the local sharpness of the objective function around the optimal solutions. To the best of our knowledge, this is the lowest iteration complexity achieved so far for the considered non-smooth optimization problems without strong convexity assumption. The HOPS algorithm employs Nesterov's smoothing technique and Nesterov's accelerated gradient method and runs in stages, which gradually decreases the smoothing parameter in a stage-wise manner until it yields a sufficiently good approximation of the original function. We show that HOPS enjoys a linear convergence for many well-known non-smooth problems (e.g., empirical risk minimization with a piece-wise linear loss function and $\ell_1$ norm regularizer, finding a point in a polyhedron, cone programming, etc). Experimental results verify the effectiveness of HOPS in comparison with Nesterov's smoothing algorithm and the primal-dual style of first-order methods.