nonsmooth convex loss
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. [2016] provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses. Our work is the first to address uniform stability of SGD on nonsmooth convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses.
Review for NeurIPS paper: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Weaknesses: - Below eq (3), for the upper bound of \delta_t the right-hand side should be 2\sum_s\eta_sa_s instead of 2\sum_s\eta_sa_s\delta_s . It would be interesting to add some discussions or comparison with these references mentioned below: 1. "Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent". In this paper, their work relaxes the smoothness to \alpha -Holder continuity of (sub)gradients, which include the non-smooth loss functions in this paper as \alpha 0 . Their stability analysis also improves the optimal generalization bounds O(1/\sqrt{n}) for multi-pass SGD with T O(n 2) . It seems to me that the main technical novelty appeared in the proof of Lemma 3 which studied \delta_t 2 (as opposed to the study of \delta_t in Hardt et al's paper) using the approximate contraction for the gradient mapping for the non-smooth loss which has already explored in the above paper. Similar ideas have already explored in the above reference in a more general setting.
Review for NeurIPS paper: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
This paper got high scores: 9,7,6,8, all with high confidence. The major concerns are from Reviewer #3, who asked about the relationship with "Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent" ([*]) and whether the technique of using Poisson sampling in [1] can be used in the current work which uses uniform sampling. While the authors clarified in the rebuttal that their work is stronger than [*] and some of the results in [*] may not be generalized to the case in the current paper, and the technique in [1] can be applied straightforwardly, Reviewer #3 further refuted on the first claim and doubted on the second claim during discussion. The AC confirmed that this paper is concurrent with [*] and deemed that Reviewer #3 may have missed the sketched proof in the "Privacy Analysis" section of the rebuttal. During further discussion, Reviewer #3 acknowledged that the sketched proof made sense and supported acceptance.
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. [2016] provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses. Our work is the first to address uniform stability of SGD on nonsmooth convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses.