Reviews: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares

Neural Information Processing Systems 

The paper considers SGD for least squares regression, and establishes results for the last iterate (as is often done in practice) as opposed to an average over many iterates (as is often in theory). Tools are not new, and so somewhat incremental in that sense, but the paper is well written and on a core problem, so is of interest in that sense.