Reviews: Optimal Learning for Multi-pass Stochastic Gradient Methods
–Neural Information Processing Systems
This work provides a strong contribution in that it apparently is the first work to net optimal rates (up to log factors) for SGM, and moreover, it also handles a mini-batch analysis which includes the (full) batch method as a special case. Such rates previously had been established only for the (batch) ridge regression method. My interpretation of what all the results actually show is given in the Summary. I find the current solution of relying on cross-validation for adaptation to be a bit of an inelegant cop-out (even if there is a theoretically-supported method for using it); given that several of your corollaries provide a guarantee where \zeta and \gamma enter the picture only through T *, can you provide a self-monitoring method that decides when to stop? In particular, I find the most exciting results to be Corollaries 3.3 and 3.9, as only the stopping time depends on the (unknown) capacity parameters, and so such an online stopping mechanism might be possible.
Neural Information Processing Systems
Jan-20-2025, 23:24:36 GMT
- Technology: