Review for NeurIPS paper: Escaping Saddle-Point Faster under Interpolation-like Conditions
–Neural Information Processing Systems
Weaknesses: - The importance of the SGC condition remains unclear. In Line 129, the authors claimed that SGC condition is satisfied in some practical settings such as the training of deep neural networks, therefore the SGC condition should be regarded as an interesting special setting for nonconvex optimization. However, recent work [1,2] showed that the training of deep neural networks can be further regarded as a special task of convex optimization in the Neural tangent kernel (NTK) regime, which is a stronger condition than SGC. Therefore, the authors may want to clarify the importance of SGC by showing some more examples in machine learning. As the authors suggested, [VBS18] firstly studied the SGC condition under nonconvex setting and proposed that SGD costs O(1/\epsilon 2) gradient complexity to find first-order stationary points. Meanwhile, note that [AZL18] proposed a generic framework which could turn any algorithms for finding first-order stationary points into algorithms for finding approximate local minimizer, without hurting the convergence rate.
Neural Information Processing Systems
Jan-26-2025, 14:34:28 GMT
- Technology: