I have recently completed a multi-class classification problem given as a take-home assignment for a data scientist position. It was a good opportunity to compare the two state-of-the-art implementations of gradient boosting decision trees which are XGBoost and LightGBM. Both algorithms are so powerful that they are prominent among the best performing machine learning models. The dataset contains over 60 thousand observations and 103 numerical features. The target variable contains 9 different classes.
Microsoft has been really increasing their development of tools in the predictive analytics and machine learning space. Another such tool they released recently is LightGBM. From the Github site... LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. Microsoft is definitely increasing their attempts to capitalize on the machine learning and big data movement. I hope they continue to develop tools such as LightGBM and R with SQL Server.
In this essay, we have comprehensively evaluated the feasibility and suitability of adopting the Machine Learning Models on the forecast of corporation fundamentals (i.e. the earnings), where the prediction results of our method have been thoroughly compared with both analysts' consensus estimation and traditional statistical models. As a result, our model has already been proved to be capable of serving as a favorable auxiliary tool for analysts to conduct better predictions on company fundamentals. Compared with previous traditional statistical models being widely adopted in the industry like Logistic Regression, our method has already achieved satisfactory advancement on both the prediction accuracy and speed. Meanwhile, we are also confident enough that there are still vast potentialities for this model to evolve, where we do hope that in the near future, the machine learning model could generate even better performances compared with professional analysts.
It is obvious that the average price is higher when buyer pays shipping. There seems to be various on the average price between each item condition id. After above exploratory data analysis, I decide to use all the features to build our model. Under the umbrella of the DMTK project of Microsoft, LightGBM is a gradient boosting framework that uses tree based learning algorithms. Therefore, we are going to give it a try.
Light Gradient Boosted Machine, or LightGBM for short, is an efficient and effective implementation of the gradient boosting algorithm. LightGBM extends the gradient boosting algorithm by adding a type of automatic feature selection as well as focusing on boosting examples with larger gradients. This can result in a dramatic speedup of training and improved predictive performance. As such, LightGBM has become a de facto algorithm for machine learning competitions when working with tabular data for regression and classification predictive modeling tasks. As such, it owns a share of the blame for the increased popularity and wider adoption of gradient boosting methods in general, along with Extreme Gradient Boosting (XGBoost).