I recently participated in this Kaggle competition (WIDS Datathon by Stanford) where I was able to land up in Top 10 using various boosting algorithms. Since then, I have been very curious about the fine workings of each model including parameter tuning, pros and cons and hence decided to write this blog. Despite the recent re-emergence and popularity of neural networks, I am focusing on boosting algorithms because they are still more useful in the regime of limited training data, little training time and little expertise for parameter tuning. Since XGBoost (often called GBM Killer) has been in the machine learning world for a longer time now with lots of articles dedicated to it, this post will focus more on CatBoost & LGBM. LightGBM uses a novel technique of Gradient-based One-Side Sampling (GOSS) to filter out the data instances for finding a split value while XGBoost uses pre-sorted algorithm & Histogram-based algorithm for computing the best split.
In this paper we present CatBoost, a new open-sourced gradient boosting library that successfully handles categorical features and outperforms existing publicly available implementations of gradient boosting in terms of quality on a set of popular publicly available datasets. The library has a GPU implementation of learning algorithm and a CPU implementation of scoring algorithm, which are significantly faster than other gradient boosting libraries on ensembles of similar sizes.
CatBoost is based on gradient boosting. A new machine learning technique developed by Yandex that outperforms many existing boosting algorithms like XGBoost, Light GBM. While deep learning algorithms requires lots of data and computational power, boosting algorithms are still in need for most of the business problems. However boosting algorithms like XGBoost takes hours to train and sometimes you'll get frustrated while tuning hyper-parameters. On the other hand, CatBoost is easy to implement and very powerful.
Gradient boosting decision trees is the state of the art for structured data problems. Two modern algorithms that make gradient boosted tree models are XGBoost and LightGBM. In this article I'll summarize their introductory papers for each algorithm's approach. Gradient Boosting Decision Trees (GBDT) are currently the best techniques for building predictive models from structured data. Unlike models for analyzing images (for that you want to use a deep learning model), structured data problems can be solved very well with a lot of decision trees.
In the previous post, we learned about tree based learning methods - basics of tree based models and the use of bagging to reduce variance. We also looked at one of the most famous learning algorithms based on the idea of bagging- random forests. In this post, we will look into the details of yet another type of tree-based learning algorithms: boosted trees. Boosting, similar to Bagging, is a general class of learning algorithm where a set of weak learners are combined to get strong learners. For classification problems, a weak learner is defined to be a classifier which is only slightly correlated with the true classification (it can label examples better than random guessing). In contrast, a strong learner is a classifier that is arbitrarily well-correlated with the true classification.