Reviews: Momentum-Based Variance Reduction in Non-Convex SGD

Jan-26-2025, 15:54:52 GMT–Neural Information Processing Systems

I agree with R3 that you did a poor job on relating your work to existing methods, in particular SARAH. Please also make sure that you carefully address the question of optimality. I also realized that your method in fact has nothing to do with momentum. Consider for instance deterministic objective, f(x, \xi) f(x). If one has a tight estimate, i.e. d_{t-1} abla f(x_{t-1}), then from your update rules it follows that d_t abla f(x_t), i.e. the method become gradient descent with no momentum! Your title, thus, is very confusing and I highly encourage you to change it.

experiment, momentum-based variance reduction, non-convex sgd, (2 more...)

Neural Information Processing Systems

Jan-26-2025, 15:54:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.39)