Review for NeurIPS paper: Robustness Analysis of Non-Convex Stochastic Gradient Descent using Biased Expectations
–Neural Information Processing Systems
Weaknesses: While the "biased expectation" appears to be a powerful tool, the overall results are restricted to the gradients of the algorithm at _some_ time t in the last T iterates. While this is a common outcome of the standard analysis of SGD, it would be nice if (with some additional assumptions on f) the results could be transposed to f(x_t) or x_t within some basin of attraction. The special case of s 0 needs much more detailed treatment. While the authors point out in the supplement that \phi is continuous at s 0, much of the document switches between looking at s- 0 or s 0 without explanation. Assumption 1: I see that the authors need to contol X_t 2 in Thm 1. (Eq.
Neural Information Processing Systems
Feb-5-2025, 08:23:10 GMT
- Technology: