Reviews: Optimal Learning for Multi-pass Stochastic Gradient Methods

Jan-20-2025, 23:24:36 GMT–Neural Information Processing Systems

This work provides a strong contribution in that it apparently is the first work to net optimal rates (up to log factors) for SGM, and moreover, it also handles a mini-batch analysis which includes the (full) batch method as a special case. Such rates previously had been established only for the (batch) ridge regression method. My interpretation of what all the results actually show is given in the Summary. I find the current solution of relying on cross-validation for adaptation to be a bit of an inelegant cop-out (even if there is a theoretically-supported method for using it); given that several of your corollaries provide a guarantee where \zeta and \gamma enter the picture only through T *, can you provide a self-monitoring method that decides when to stop? In particular, I find the most exciting results to be Corollaries 3.3 and 3.9, as only the stopping time depends on the (unknown) capacity parameters, and so such an online stopping mechanism might be possible.

capacity parameter, multi-pass stochastic gradient method, optimal learning, (1 more...)

Neural Information Processing Systems

Jan-20-2025, 23:24:36 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Performance Analysis (0.58)
  - Statistical Learning > Gradient Descent (0.40)