Behavior of linear L2-boosting algorithms in the vanishing learning rate asymptotic
Dombry, Clément, Esstafa, Youssef
In the past decades, boosting has become a major and powerful prediction method in machine learning. The success of the classification algorithm AdaBoost by Freund and Schapire (1999) demonstrated the possibility to combine many weak learners in a sequential way in order to produce better predictions, with widespread applications in gene expression (Dudoit et al., 2002) or music genre identification (Bergstra et al., 2006), to name only a few. Friedman et al. (2000) were able to see a wider statistical framework that lead to the gradient boosting (Friedman, 2001), where a weak learner (e.g., regression trees) is used to optimize a loss function in a sequential procedure akin to gradient descent. Choosing the loss function according to the statistical problem at hand results in a versatile and efficient tool that can handle classification, regression, quantile regression or survival analysis... The popularity of gradient boosting is also due to its efficient implementation in the R package gbm by Ridgeway (2007). Along the methodological developments, strong theoretical results have justified the good performance of boosting. Consistency of boosting algorithm, i.e. their ability to achieve the optimal Bayes error rate for large samples, is considered in Breiman (2004), Zhang and Yu (2005) or Bartlett and Traskin (2007). The present paper is strongly influenced by Bühlmann 2 and Yu (2003) that proposes an analysis of regression boosting algorithms built on linear base learners thanks to explicit formulas for the boosted predictor and its error rate. In this paper, we focus on gradient boosting for regression with square loss and we briefly describe the corresponding algorithm.
Dec-29-2020
- Country:
- North America > United States
- New York (0.04)
- Europe
- Asia > Middle East
- Jordan (0.05)
- North America > United States
- Genre:
- Research Report (1.00)
- Industry:
- Technology: