AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A (small) introduction to Boosting

#artificialintelligenceApr-18-2016, 04:10:51 GMT

Boosting is a machine learning meta-algorithm that aims to iteratively build an ensemble of weak learners, in an attempt to generate a strong overall model. For example, consider a problem of binary classification with approximately 50% of samples belonging to each class. Random guessing in this case would yield an accuracy of around 50%. So a weak learner would be any algorithm, however simple, that slightly improves this score – say 51-55% or more. Usually, weak learners are pretty basic in nature.

artificial intelligence, decision tree learning, machine learning, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.33)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.32)

Add feedback

Tuning Parameters for Boosting/Bagging/Random Forest • /r/MachineLearning

@machinelearnbotApr-17-2016, 21:05:11 GMT

Random forests usually performs quite well with the default settings. That is bootstrap resampling scheme, unpruned trees, as many trees as possible to get results in a reasonable amount of time and sqrt(#features) tried per split (mtry parameter). Then you can try to optimize the choices by checking the results on out of bag data (those each tree didnt train on because of the resampling scheme). If you have very unbalanced classes you should decide a measure of interest (such as true positive ratio) and try to tune the related parameter. Out of bag data can be trusted almost as a proper cross validation if you use enough trees and bootstrap resampling.

artificial intelligence, decision tree learning, tuning parameter, (4 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Accurate Sales Forecast for Data Analysts: Building a Random Forest model with Just SQL and Hivemall Treasure Data Blog

#artificialintelligenceApr-12-2016, 23:45:59 GMT

In this blog post, we will use Hivemall, the open source Machine Learning-on-SQL library available in the Treasure Data environment, to introduce the basics of machine learning. We will use an E-Commerce dataset from Kaggle, the data science competition platform. The first challenge is predicting the retail sales for the Rossman stores (the full details at Kaggle). We will use an ensemble learning technique known as Random Forest regression. Rossman is a pharmacy chain with over 3,000 stores in seven countries within Europe.

artificial intelligence, machine learning, training data, (13 more...)

#artificialintelligence

Country: Europe > Germany (0.05)

Industry: Marketing (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.65)

Add feedback

A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)

#artificialintelligenceApr-12-2016, 17:11:10 GMT

Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand (classification or regression). Methods like decision trees, random forest, gradient boosting are being popularly used in all kinds of data science problems. Hence, for every analyst (fresher also), it's important to learn these algorithms and use them for modeling. This tutorial is meant to help beginners learn tree based modeling from scratch. After the successful completion of this tutorial, one is expected to become proficient at using tree based algorithms and build predictive models. Note: This tutorial requires no prior knowledge of machine learning.

algorithm, artificial intelligence, machine learning, (18 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Intro to Machine Learning in H2O

#artificialintelligenceApr-9-2016, 23:36:26 GMT

The focus of this workshop is machine learning using the H2O R and Python packages. H2O is an open source distributed machine learning platform designed for big data, with the added benefit that it's easy to use on a laptop (in addition to a multi-node Hadoop or Spark cluster). The core machine learning algorithms of H2O are implemented in high-performance Java; however, fully featured APIs are available in R, Python, Scala, REST/JSON and also through a web interface. Since H2O's algorithm implementations are distributed, this allows the software to scale to very large datasets that may not fit into RAM on a single machine. H2O currently features distributed implementations of generalized linear models, gradient boosting machines, random forest, deep neural nets, dimensionality reduction methods (PCA, GLRM), clustering algorithms (K-means), and anomaly detection methods, among others.

artificial intelligence, data mining, machine learning, (8 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.99)
Information Technology > Data Science > Data Mining > Big Data (0.61)
Information Technology > Data Science > Data Mining > Anomaly Detection (0.61)
(2 more...)

Add feedback

Ensemble Methods: Elegant Techniques to Produce Improved Machine Learning Results

#artificialintelligenceApr-9-2016, 09:55:34 GMT

Ensemble methods are techniques that create multiple models and then combine them to produce improved results. Ensemble methods usually produces more accurate solutions than a single model would. This has been the case in a number of machine learning competitions, where the winning solutions used ensemble methods. In the popular Netflix Competition, the winner used an ensemble method to implement a powerful collaborative filtering algorithm. Another example is KDD 2009 where the winner also used ensemble methods.

algorithm, artificial intelligence, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Comments on: "A Random Forest Guided Tour" by G. Biau and E. Scornet

Arlot, Sylvain, Genuer, Robin

arXiv.org Machine LearningApr-6-2016

This paper is a comment on the survey paper by Biau and Scornet (2016) about random forests. We focus on the problem of quantifying the impact of each ingredient of random forests on their performance. We show that such a quantification is possible for a simple pure forest, leading to conclusions that could apply more generally. Then, we consider "holdout" random forests, which are a good middle point between "toy" pure forests and Breiman's original random forests. We would like to thank G. Biau and E. Scornet for their clear and thought-provoking survey (Biau and Scornet, 2016).

artificial intelligence, machine learning, random forest, (18 more...)

arXiv.org Machine Learning

1604.01515

Country:

Europe > Austria > Vienna (0.14)
Europe > France (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Walmart and Random Forest

@machinelearnbotApr-4-2016, 07:41:37 GMT

In the recent Walmart Kaggle competition I used a Random Forest classifier to solve a market basket problem. A market basket model is built on the idea there exists relationships between items purchased together. For example, a person purchasing a new toothbrush is more likely to also purchase toothpaste than motor oil in the same shopping. Retailers use these market basket relationships in the design of their stores for ease of use and also to increase sales. In this specific problem Walmart has broken up their shopping trips into 38 unique'TripType'.

artificial intelligence, machine learning, walmart and random forest, (8 more...)

@machinelearnbot

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.63)

Add feedback

Installing XGBoost For Anaconda on Windows (IT Best Kept Secret Is Optimization)

#artificialintelligenceApr-4-2016, 00:55:28 GMT

XGBoost is a recent implementation of Boosted Trees. It is a machine learning algorithm that yields great results on recent Kaggle competitions. I decided to install it on my computers to give it a try. Installation on OSX was straightforward using these instructions. Installation on Windows was not as straightforward.

artificial intelligence, machine learning, xgboost, (16 more...)

#artificialintelligence

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)

Add feedback

How to Set Up Distributed XGBoost on MapR-FS

#artificialintelligenceApr-2-2016, 20:00:43 GMT

XGBoost is a library that is designed for boosted (tree) algorithms. It has become a popular machine learning framework among data science practitioners, especially on Kaggle, which is a platform for data prediction competitions where researchers post their data and statisticians and data miners compete to produce the best models. For structured learning problems on Kaggle, it can be difficult to get into the top 10 without including XGBoost. Typically, data scientists use multi-thread single machines to train XGBoost models. Very few people have deployed XGBoost on a distributed environment and achieved good performance.

artificial intelligence, machine learning, xgboost, (17 more...)

#artificialintelligence

Industry: Education (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback