Benchmarking and Optimization of Gradient Boosted Decision Tree Algorithms
Anghel, Andreea, Papandreou, Nikolaos, Parnell, Thomas, De Palma, Alessandro, Pozidis, Haralampos
Abstract--Gradient boosted decision trees (GBDTs) have seen widespread adoption in academia, industry and competitive data science due to their state-of-the-art performance in a wide variety of machine learning tasks. In this paper, we present an extensive empirical comparison of XGBoost, LightGBM and CatBoost, three popular GBDT algorithms, to aid the data science practitioner in the choice from the multitude of available implementations. Specifically, we evaluate their behavior on four largescale datasets with varying shapes, sparsities and learning tasks, in order to evaluate the algorithms' generalization performance, training times (on both CPU and GPU) and their sensitivity to hyper-parameter tuning. In our analysis, we first make use of a distributed grid-search to benchmark the algorithms on fixed configurations, and then employ a state-of-the-art algorithm for Bayesian hyper-parameter optimization to fine-tune the models. Many powerful techniques in machine learning involve constructing a strong learner from a number of weak learners. One such approach, known as bagging, combines the predictions of a large number of weak learners, each using a different bootstrap sample of the training data set [1]. It has been shown that such a an approach can reduce variance and produce a strong learner. An alternative approach, known as boosting, involves iteratively training a sequence of weak learners, whereby the training examples for the next learner are weighted according to the success of the previouslyconstructed learners.
Sep-12-2018
- Country:
- North America > United States
- New Jersey > Mercer County
- Princeton (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New Jersey > Mercer County
- Europe > Switzerland
- Asia > Middle East
- Israel > Haifa District > Haifa (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Technology: