Reviews: Coordinate Descent with Bandit Sampling

Neural Information Processing Systems 

The paper introduce a coordinate descent algorithm with an adaptive sampling à la Gauss-Southwell. Based on a descent lemma that quantifies the decay of the objective function when a coordinate is selected, the authors propose the "max_r" strategy that iteratively choose the coordinate that yields to the largest decrease. The paper follows recent developments on coordinate descent notably (Csiba et al 2015), (Nutini et al 2015), (Perekrestenko et al 2017) with an improved convergence bounds. As for previous adaptive sampling, the proposed method require a computational complexity equivalent to a full gradient descent which can be prohibitive in large scale optimization problem. To overcome this issue, the authors propose to learn the best coordinate by approximating the "max_r" strategy.