Review for NeurIPS paper: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Jan-23-2025, 02:06:36 GMT–Neural Information Processing Systems

Weaknesses: - Below eq (3), for the upper bound of \delta_t the right-hand side should be 2\sum_s\eta_sa_s instead of 2\sum_s\eta_sa_s\delta_s . It would be interesting to add some discussions or comparison with these references mentioned below: 1. "Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent". In this paper, their work relaxes the smoothness to \alpha -Holder continuity of (sub)gradients, which include the non-smooth loss functions in this paper as \alpha 0 . Their stability analysis also improves the optimal generalization bounds O(1/\sqrt{n}) for multi-pass SGD with T O(n 2) . It seems to me that the main technical novelty appeared in the proof of Lemma 3 which studied \delta_t 2 (as opposed to the study of \delta_t in Hardt et al's paper) using the approximate contraction for the gradient mapping for the non-smooth loss which has already explored in the above paper. Similar ideas have already explored in the above reference in a more general setting.

neurips paper, nonsmooth convex loss, stochastic gradient descent, (7 more...)

Neural Information Processing Systems

Jan-23-2025, 02:06:36 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)