Goto

Collaborating Authors

 randomized learning algorithm




L_2 -Uniform Stability of Randomized Learning Algorithms: Sharper Generalization Bounds and Confidence Boosting

Neural Information Processing Systems

Exponential generalization bounds with near-optimal rates have recently been established for uniformly stable algorithms~\citep{feldman2019high,bousquet2020sharper}. We seek to extend these best known high probability bounds from deterministic learning algorithms to the regime of randomized learning. One simple approach for achieving this goal is to define the stability for the expectation over the algorithm's randomness, which may result in sharper parameter but only leads to guarantees regarding the on-average generalization error. Another natural option is to consider the stability conditioned on the algorithm's randomness, which is way more stringent but may lead to generalization with high probability jointly over the randomness of sample and algorithm. The present paper addresses such a tension between these two alternatives and makes progress towards relaxing it inside a classic framework of confidence-boosting. To this end, we first introduce a novel concept of $L_2$-uniform stability that holds uniformly over data but in second-moment over the algorithm's randomness. Then as a core contribution of this work, we prove a strong exponential bound on the first-moment of generalization error under the notion of $L_2$-uniform stability. As an interesting consequence of the bound, we show that a bagging-based meta algorithm leads to near-optimal generalization with high probability jointly over the randomness of data and algorithm. We further substantialize these generic results to stochastic gradient descent (SGD) to derive sharper exponential bounds for convex or non-convex optimization with natural time-decaying learning rates, which have not been possible to prove with the existing stability-based generalization guarantees.


A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Ben London

Neural Information Processing Systems

We study the generalization error of randomized learning algorithms--focusing on stochastic gradient descent (SGD)--using a novel combination of P AC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data.




L_2 -Uniform Stability of Randomized Learning Algorithms: Sharper Generalization Bounds and Confidence Boosting

Neural Information Processing Systems

Exponential generalization bounds with near-optimal rates have recently been established for uniformly stable algorithms \citep{feldman2019high,bousquet2020sharper}. We seek to extend these best known high probability bounds from deterministic learning algorithms to the regime of randomized learning. One simple approach for achieving this goal is to define the stability for the expectation over the algorithm's randomness, which may result in sharper parameter but only leads to guarantees regarding the on-average generalization error. Another natural option is to consider the stability conditioned on the algorithm's randomness, which is way more stringent but may lead to generalization with high probability jointly over the randomness of sample and algorithm. The present paper addresses such a tension between these two alternatives and makes progress towards relaxing it inside a classic framework of confidence-boosting.


A PAC-Bayesian Analysis of Randomized Learning with Application to Stochastic Gradient Descent

Ben London

Neural Information Processing Systems

We study the generalization error of randomized learning algorithms--focusing on stochastic gradient descent (SGD)--using a novel combination of PAC-Bayes and algorithmic stability. Importantly, our generalization bounds hold for all posterior distributions on an algorithm's random hyperparameters, including distributions that depend on the training data.


Boosting the Confidence of Generalization for $L_2$-Stable Randomized Learning Algorithms

Yuan, Xiao-Tong, Li, Ping

arXiv.org Machine Learning

Exponential generalization bounds with near-tight rates have recently been established for uniformly stable learning algorithms. The notion of uniform stability, however, is stringent in the sense that it is invariant to the data-generating distribution. Under the weaker and distribution dependent notions of stability such as hypothesis stability and $L_2$-stability, the literature suggests that only polynomial generalization bounds are possible in general cases. The present paper addresses this long standing tension between these two regimes of results and makes progress towards relaxing it inside a classic framework of confidence-boosting. To this end, we first establish an in-expectation first moment generalization error bound for potentially randomized learning algorithms with $L_2$-stability, based on which we then show that a properly designed subbagging process leads to near-tight exponential generalization bounds over the randomness of both data and algorithm. We further substantialize these generic results to stochastic gradient descent (SGD) to derive improved high-probability generalization bounds for convex or non-convex optimization problems with natural time decaying learning rates, which have not been possible to prove with the existing hypothesis stability or uniform stability based results.


Information-Theoretic Bounds on the Moments of the Generalization Error of Learning Algorithms

Aminian, Gholamali, Toni, Laura, Rodrigues, Miguel R. D.

arXiv.org Machine Learning

Generalization error bounds are critical to understanding the performance of machine learning models. In this work, building upon a new bound of the expected value of an arbitrary function of the population and empirical risk of a learning algorithm, we offer a more refined analysis of the generalization behaviour of a machine learning models based on a characterization of (bounds) to their generalization error moments. We discuss how the proposed bounds -- which also encompass new bounds to the expected generalization error -- relate to existing bounds in the literature. We also discuss how the proposed generalization error moment bounds can be used to construct new generalization error high-probability bounds.