AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

GPU-acceleration for Large-scale Tree Boosting

Zhang, Huan, Si, Si, Hsieh, Cho-Jui

arXiv.org Machine LearningJun-26-2017

In this paper, we present a novel massively parallel algorithm for accelerating the decision tree building procedure on GPUs (Graphics Processing Units), which is a crucial step in Gradient Boosted Decision Tree (GBDT) and random forests training. Previous GPU based tree building algorithms are based on parallel multi-scan or radix sort to find the exact tree split, and thus suffer from scalability and performance issues. We show that using a histogram based algorithm to approximately find the best split is more efficient and scalable on GPU. By identifying the difference between classical GPU-based image histogram construction and the feature histogram construction in decision tree training, we develop a fast feature histogram building kernel on GPU with carefully designed computational and memory access sequence to reduce atomic update conflict and maximize GPU utilization. Our algorithm can be used as a drop-in replacement for histogram construction in popular tree boosting systems to improve their scalability. As an example, to train GBDT on epsilon dataset, our method using a main-stream GPU is 7-8 times faster than histogram based algorithm on CPU in LightGBM and 25 times faster than the exact-split finding algorithm in XGBoost on a dual-socket 28-core Xeon server, while achieving similar prediction accuracy.

artificial intelligence, histogram, machine learning, (18 more...)

arXiv.org Machine Learning

1706.08359

Country:

North America > United States > California > Yolo County > Davis (0.04)
North America > United States > New York (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Santa Clara County > Mountain View (0.04)

Genre:

Research Report (0.64)
Workflow (0.48)

Technology:

Information Technology > Hardware (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

How Does the Random Forest Algorithm Work in Machine Learning

#artificialintelligenceJun-22-2017, 23:10:33 GMT

In this article, you are going to learn the most popular classification algorithm. Which is the random forest algorithm. As a motivation to go further I am going to give you one of the best advantages of random forest. Random forest algorithm can use both for classification and the regression kind of problems. The Same algorithm both for classification and regression, You mind be thinking I am kidding.

algorithm, artificial intelligence, machine learning, (15 more...)

#artificialintelligence

Industry: Banking & Finance (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Using ANNs on small data – Deep Learning vs. Xgboost

@machinelearnbotJun-20-2017, 14:15:07 GMT

Andrew Beam does a great job showing that small datasets are not off limits for current neural net methods. If you use the regularisation methods at hand – ANNs is entirely possible to use instead of classic methods. Let's see how this holds up on up on some benchmark datasets. Let's start with the iris dataset that you nicely can pull with the pandas read_csv function right of the internets. We create a feature matrix X and a target y from the Pandas dataframe.

artificial intelligence, deep learning, machine learning, (6 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.52)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.42)

Add feedback

Reviving Threshold-Moving: a Simple Plug-in Bagging Ensemble for Binary and Multiclass Imbalanced Data

Collell, Guillem, Prelec, Drazen, Patil, Kaustubh

arXiv.org Machine LearningJun-20-2017

Class imbalance presents a major hurdle in the application of data mining methods. A common practice to deal with it is to create ensembles of classifiers that learn from resampled balanced data. For example, bagged decision trees combined with random undersampling (RUS) or the synthetic minority oversampling technique (SMOTE). However, most of the resampling methods entail asymmetric changes to the examples of different classes, which in turn can introduce its own biases in the model. Furthermore, those methods require a performance measure to be specified a priori before learning. An alternative is to use a so-called threshold-moving method that a posteriori changes the decision threshold of a model to counteract the imbalance, thus has a potential to adapt to the performance measure of interest. Surprisingly, little attention has been paid to the potential of combining bagging ensemble with threshold-moving. In this paper, we present probability thresholding bagging (PT-bagging), a versatile plug-in method that fills this gap. Contrary to usual rebalancing practice, our method preserves the natural class distribution of the data resulting in well calibrated posterior probabilities. We also extend the proposed method to handle multiclass data. The method is validated on binary and multiclass benchmark data sets. We perform analyses that provide insights into the proposed method.

artificial intelligence, machine learning, threshold, (18 more...)

arXiv.org Machine Learning

1606.08698

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Gradient Boosting – the coolest kid on the machine learning block

#artificialintelligenceJun-17-2017, 04:05:25 GMT

Gradient boosting is a technique attracting attention for its prediction speed and accuracy, especially with large and complex data. Don't just take my word for it, the chart below shows the rapid growth of Google searches for xgboost (the most popular gradient boosting R package). From data science competitions to machine learning solutions for business, gradient boosting has produced best-in-class results. In this blog post I describe what it is and how to use it. Machine learning models can be fitted to data individually, or combined in an ensemble.

accuracy, artificial intelligence, machine learning, (10 more...)

#artificialintelligence

Industry: Education (0.77)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Boosting the accuracy of your Machine Learning models

#artificialintelligenceJun-12-2017, 07:00:39 GMT

Boosting is here to help. Boosting is a popular machine learning algorithm that increases accuracy of your model, something like when racers use nitrous boost to increase the speed of their car. Boosting uses a base machine learning algorithm to fit the data. This can be any algorithm, but Decision Tree is most widely used. For an answer to why so, just keep reading.

algorithm, artificial intelligence, machine learning, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.44)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.34)

Add feedback

Displayr Gradient Boosting - the coolest kid on the machine learning block

#artificialintelligenceJun-10-2017, 20:40:22 GMT

Gradient boosting is a technique attracting attention for its prediction speed and accuracy, especially with large and complex data. As evidenced in the chart below showing the rapid growth of Google searches for xgboost (the best gradient boosting R package). From data science competitions to machine learning solutions for business, gradient boosting has produced best-in-class results. In this blog post I describe what it is and how to use it in Displayr. Gradient boosting is a type of boosting.

artificial intelligence, machine learning, target outcome, (14 more...)

#artificialintelligence

Industry: Education (0.77)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Improving Predictions with Ensemble Model

@machinelearnbotJun-7-2017, 23:40:08 GMT

"Alone we can do so little and together we can do much" - a phrase from Helen Keller during 50's is a reflection of achievements and successful stories in real life scenarios from decades. Same thing applies with most of the cases from innovation with big impacts and with advanced technologies world. The machine Learning domain is also in the same race to make predictions and classification in a more accurate way using so called ensemble method and it is proved that ensemble modeling offers one of the most convincing way to build highly accurate predictive models. Ensemble methods are learning models that achieve performance by combining the opinions of multiple learners. Typically, an ensemble model is a supervised learning technique for combining multiple weak learners or models to produce a strong learner with the concept of Bagging and Boosting for data sampling.

artificial intelligence, learner, machine learning, (12 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.57)

Add feedback

Walmart Competition: Trip Type Classification

@machinelearnbotJun-6-2017, 18:40:15 GMT

They took the NYC Data Science Academy 12-week full-time data science bootcamp program from Sep. 23 to Dec. 18, 2015. The post was based on their fourth in-class project (due after the 8th week of the program). Walmart uses trip type classification to segment its shoppers and their store visits to better improve the shopping experience. Walmart's trip types are created from a combination of existing customer insights and purchase history data. The purpose of the Kaggle competition is to use only the purchase data provided to derive Walmart's classification labels.

machine learning, natural language, text classification, (15 more...)

@machinelearnbot

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

Add feedback

InfiniteBoost: building infinite ensembles with gradient descent

Rogozhnikov, Alex, Likhomanenko, Tatiana

arXiv.org Machine LearningJun-4-2017

In machine learning ensemble methods have demonstrated high accuracy for the variety of problems in different areas. The most known algorithms intensively used in practice are random forests and gradient boosting. In this paper we present InfiniteBoost -- a novel algorithm, which combines the best properties of these two approaches. The algorithm constructs the ensemble of trees for which two properties hold: trees of the ensemble incorporate the mistakes done by others; at the same time the ensemble could contain the infinite number of trees without the over-fitting effect. The proposed algorithm is evaluated on the regression, classification, and ranking tasks using large scale, publicly available datasets.

artificial intelligence, gradient, machine learning, (16 more...)

arXiv.org Machine Learning

1706.01109

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.51)

Add feedback