Review for NeurIPS paper: Stochasticity of Deterministic Gradient Descent: Large Learning Rate for Multiscale Objective Function
–Neural Information Processing Systems
Additional Feedback: [After rebuttal] I appreciate the additional explanations in the rebuttal. I think the example (a more complete version) will go a long way in improving the paper, but as is presented I think not enough details is given for a proper evaluation, thus I look forward to reading a revised version of this work. Note that my tautology comment is not saying that the proof is trivial, but saying the way it is written masks the potential insights the proof may give, in particular, there should be a result that shows that such a limit in Cons 1 exists under some general conditions characterising the data and the model architecture. I believe the example provided in the rebuttal may potentially be useful for formalising this. On first reading, these conditions appear not well-motivated.
Neural Information Processing Systems
Jan-22-2025, 06:47:20 GMT
- Technology: