Regularized Gradient Boosting
Corinna Cortes, Mehryar Mohri, Dmitry Storcheus
–Neural Information Processing Systems
Gradient Boosting (GB) is a popular and very successful ensemble method for binary trees. While various types of regularization of the base predictors are used with this algorithm, the theory that connects such regularizations with generalization guarantees is poorly understood. We fill this gap by deriving data-dependent learning guarantees for GB used with regularization, expressed in terms of the Rademacher complexities of the constrained families of base predictors. We introduce a new algorithm, called RGB, that directly benefits from these generalization bounds and that, at every boosting round, applies the Structural Risk Minimization principle to search for a base predictor with the best empirical fit versus complexity trade-off. Inspired by Randomized Coordinate Descent we provide a scalable implementation of our algorithm, able to search over large families of base predictors. Finally, we provide experimental results, demonstrating that our algorithm achieves significantly better out-of-sample performance on multiple datasets than the standard GB algorithm used with its regularization.
Neural Information Processing Systems
Jan-23-2025, 08:52:39 GMT