Goto

Collaborating Authors

 optimal generalization


Review for NeurIPS paper: Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses

Neural Information Processing Systems

Weaknesses: - Below eq (3), for the upper bound of \delta_t the right-hand side should be 2\sum_s\eta_sa_s instead of 2\sum_s\eta_sa_s\delta_s . It would be interesting to add some discussions or comparison with these references mentioned below: 1. "Fine-Grained Analysis of Stability and Generalization for Stochastic Gradient Descent". In this paper, their work relaxes the smoothness to \alpha -Holder continuity of (sub)gradients, which include the non-smooth loss functions in this paper as \alpha 0 . Their stability analysis also improves the optimal generalization bounds O(1/\sqrt{n}) for multi-pass SGD with T O(n 2) . It seems to me that the main technical novelty appeared in the proof of Lemma 3 which studied \delta_t 2 (as opposed to the study of \delta_t in Hardt et al's paper) using the approximate contraction for the gradient mapping for the non-smooth loss which has already explored in the above paper. Similar ideas have already explored in the above reference in a more general setting.


Training Data Selection for Optimal Generalization in Trigonometric Polynomial Networks

Neural Information Processing Systems

In this paper, we consider the problem of active learning in trigonomet(cid:173) ric polynomial networks and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By ana(cid:173) lyzing the condition from the functional analytic point of view, we clarify the mechanism of achieving the optimal generalization capability. We also show that a set of training examples satisfying the condition does not only provide the optimal generalization but also reduces the compu(cid:173) tational complexity and memory required for the calculation of learning results. Finally, examples of sample points satisfying the condition are given and computer simulations are performed to demonstrate the effec(cid:173) tiveness of the proposed active learning method.