Review for NeurIPS paper: Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations

Feb-5-2025, 08:23:10 GMT–Neural Information Processing Systems

Weaknesses: While the "biased expectation" appears to be a powerful tool, the overall results are restricted to the gradients of the algorithm at _some_ time t in the last T iterates. While this is a common outcome of the standard analysis of SGD, it would be nice if (with some additional assumptions on f) the results could be transposed to f(x_t) or x_t within some basin of attraction. The special case of s 0 needs much more detailed treatment. While the authors point out in the supplement that \phi is continuous at s 0, much of the document switches between looking at s- 0 or s 0 without explanation. Assumption 1: I see that the authors need to contol X_t 2 in Thm 1. (Eq.

biased expectation, non-convex stochastic gradient descent, robustness analysis, (9 more...)

Neural Information Processing Systems

Feb-5-2025, 08:23:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.85)