Reviews: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
–Neural Information Processing Systems
The paper considers SGD for least squares regression, and establishes results for the last iterate (as is often done in practice) as opposed to an average over many iterates (as is often in theory). Tools are not new, and so somewhat incremental in that sense, but the paper is well written and on a core problem, so is of interest in that sense.
Neural Information Processing Systems
Jan-22-2025, 15:34:08 GMT
- Technology: