Collaborating Authors

Microsoft releases LightGBM


Microsoft has been really increasing their development of tools in the predictive analytics and machine learning space. Another such tool they released recently is LightGBM. From the Github site... LightGBM is a fast, distributed, high performance gradient boosting (GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks. Microsoft is definitely increasing their attempts to capitalize on the machine learning and big data movement. I hope they continue to develop tools such as LightGBM and R with SQL Server.

Faster Boosting with Smaller Memory

Neural Information Processing Systems

State-of-the-art implementations of boosting, such as XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that the memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing the boosted trees, which achieves a significant speedup over XGBoost and LightGBM, especially when the memory size is small. This is achieved using a combination of three techniques: early stopping, effective sample size, and stratified sampling. Our experiments demonstrate a 10-100 speedup over XGBoost when the training data is too large to fit in memory.

Clashes break out in Yemen's key port city of Hodeida after cease-fire

The Japan Times

SANAA - Fighting erupted in Yemen's key port city of Hodeida on Sunday, the first significant clashes since warring sides agreed to a U.N.-brokered cease-fire deal in December, security officials and eyewitnesses said. Fires burned on the main front lines in the city's east and south, while exchanges of artillery fire shook buildings in combat that broke out overnight, they said. Both the Shiite Houthi rebels who hold the city and the government-backed troops who oppose them have been seen erecting barricades and digging defensive trenches. "All night long, we hear the loud roar of machine guns and artillery, which had been silent for the past two weeks," said resident Ahmed Nasser, adding that he was worried for relatives who had returned to the July 7 neighborhood on the city's eastern front. The officials spoke on condition of anonymity as they weren't authorized to brief journalists, while witnesses did so for fear of their safety.

Machine Learning for Retail Price Recommendation with Python


It is obvious that the average price is higher when buyer pays shipping. There seems to be various on the average price between each item condition id. After above exploratory data analysis, I decide to use all the features to build our model. Under the umbrella of the DMTK project of Microsoft, LightGBM is a gradient boosting framework that uses tree based learning algorithms. Therefore, we are going to give it a try.

Lightning Fast XGBoost on Multiple GPUs


XGBoost is one of the most used libraries fora data science. At the time XGBoost came into existence, it was lightning fast compared to its nearest rival Python's Scikit-learn GBM. But as the times have progressed, it has been rivaled by some awesome libraries like LightGBM and Catboost, both on speed as well as accuracy. I, for one, use LightGBM for most of the use cases where I have just got CPU for training. But when I have a GPU or multiple GPUs at my disposal, I still love to train with XGBoost.