Goto

Collaborating Authors

 sdca


Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

Neural Information Processing Systems

Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure. However, whereas much progress has been made in developing faster algorithms for this setting, the inherent limitations of these problems are not satisfactorily addressed by existing lower bounds. Indeed, current bounds focus on first-order optimization algorithms, and only apply in the often unrealistic regime where the number of iterations is less than $\cO(d/n)$ (where $d$ is the dimension and $n$ is the number of samples). In this work, we extend the framework of Arjevani et al. \cite{arjevani2015lower,arjevani2016iteration} to provide new lower bounds, which are dimension-free, and go beyond the assumptions of current bounds, thereby covering standard finite sum optimization methods, e.g., SAG, SAGA, SVRG, SDCA without duality, as well as stochastic coordinate-descent methods, such as SDCA and accelerated proximal SDCA.



Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

Neural Information Processing Systems

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in practice. Our main contribution is to introduce an accelerated mini-batch version of SDCA and prove a fast convergence rate for this method. We discuss an implementation of our method over a parallel computing system, and compare the results to both the vanilla stochastic dual coordinate ascent and to the accelerated deterministic gradient descent method of Nesterov [2007].


Dimension-Free Iteration Complexity of Finite Sum Optimization Problems

Neural Information Processing Systems

Many canonical machine learning problems boil down to a convex optimization problem with a finite sum structure. However, whereas much progress has been made in developing faster algorithms for this setting, the inherent limitations of these problems are not satisfactorily addressed by existing lower bounds. Indeed, current bounds focus on first-order optimization algorithms, and only apply in the often unrealistic regime where the number of iterations is less than $\cO(d/n)$ (where $d$ is the dimension and $n$ is the number of samples). In this work, we extend the framework of Arjevani et al. \cite{arjevani2015lower,arjevani2016iteration} to provide new lower bounds, which are dimension-free, and go beyond the assumptions of current bounds, thereby covering standard finite sum optimization methods, e.g., SAG, SAGA, SVRG, SDCA without duality, as well as stochastic coordinate-descent methods, such as SDCA and accelerated proximal SDCA.



Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The authors propose an accelerated proximal block coordinate descent algorithm, describe its application to standard regularized loss minimization problems, and conclude with experiments on a smoothed SVM. On the question of clarity: I found the paper on the whole difficult to follow, with the authors showing a marked preference for writing equations in lieu of explanations. There are also numerous small grammatical errors. I'm not aware of other algorithms that are designed to work on block-coordinate problems (although single-coordinate algorithms are common enough), and have to question the advantage of this formulation, aside from being slightly more general. Given that the application considered in section 4 is single-coordinate (am I correct about this?), it might simplify the presentation to work from a single-coordinate formulation, and merely mention that block-coordinate updates are also possible.


Accelerated Mini-Batch Stochastic Dual Coordinate Ascent

Neural Information Processing Systems

Stochastic dual coordinate ascent (SDCA) is an effective technique for solving regularized loss minimization problems in machine learning. This paper considers an extension of SDCA under the mini-batch setting that is often used in practice. Our main contribution is to introduce an accelerated mini-batch version of SDCA and prove a fast convergence rate for this method. We discuss an implementation of our method over a parallel computing system, and compare the results to both the vanilla stochastic dual coordinate ascent and to the accelerated deterministic gradient descent method of Nesterov [2007].


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

Submitted by Assigned_Reviewer_1 Q1 The authors propose a non-uniform sampling scheme for variance reduced SGD type algorithms based on local smoothness and the fact that the gradient of many individual losses is constant. The authors show that such a scheme is able to outperform uniform sampling for SVRG and SDCA. Overall the idea is an interesting one and seems to perform well in practice. However, I feel that the paper has some major clarity issues. In general, I find the paper quite difficult to read.


SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives

Aaron Defazio, Francis Bach, Simon Lacoste-Julien

Neural Information Processing Systems

In this work we introduce a new optimisation method called SAGA in the spirit of SAG, SDCA, MISO and SVRG, a set of recently proposed incremental gradient algorithms with fast linear convergence rates. SAGA improves on the theory behind SAG and SVRG, with better theoretical convergence rates, and has support for composite objectives where a proximal operator is used on the regulariser. Unlike SDCA, SAGA supports non-strongly convex problems directly, and is adaptive to any inherent strong convexity of the problem. We give experimental results showing the effectiveness of our method.