AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A Comparison of Resampling and Recursive Partitioning Methods in Random Forest for Estimating the Asymptotic Variance Using the Infinitesimal Jackknife

Brokamp, Cole, Rao, MB, Ryan, Patrick, Jandarov, Roman

arXiv.org Machine LearningJan-29-2018

The infinitesimal jackknife (IJ) has recently been applied to the random forest to estimate its prediction variance. These theorems were verified under a traditional random forest framework which uses classification and regression trees (CART) and bootstrap resampling. However, random forests using conditional inference (CI) trees and subsampling have been found to be not prone to variable selection bias. Here, we conduct simulation experiments using a novel approach to explore the applicability of the IJ to random forests using variations on the resampling method and base learner. Test data points were simulated and each trained using random forest on one hundred simulated training data sets using different combinations of resampling and base learners. Using CI trees instead of traditional CART trees as well as using subsampling instead of bootstrap sampling resulted in a much more accurate estimation of prediction variance when using the IJ. The random forest variations here have been incorporated into an open source software package for the R programming language.

artificial intelligence, machine learning, variance, (19 more...)

arXiv.org Machine Learning

doi: 10.1002/sta4.162

1706.0615

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Health & Medicine (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Extreme Gradient Boosting with XGBoost

@machinelearnbotJan-23-2018, 01:02:13 GMT

Do you know the basics of supervised learning and want to learn to use state-of-the-art models on real-world datasets? Gradient boosting is currently one of the most popular techniques for efficient modeling of tabular datasets of all sizes. XGboost is a very fast, scalable implementation of gradient boosting that has taken data science by storm, with models using XGBoost regularly winning many online data science competitions and used at scale across different industries. In this course, you'll learn how to use this powerful library alongside pandas and scikit-learn to build and tune supervised learning models. You'll work with real-world datasets to solve classification as well as regression problems.

artificial intelligence, machine learning, xgboost, (4 more...)

@machinelearnbot

Country: North America > United States > Iowa > Story County > Ames (0.07)

Genre: Instructional Material (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Optimizing Prediction Intervals by Tuning Random Forest via Meta-Validation

Bayley, Sean, Falessi, Davide

arXiv.org Machine LearningJan-22-2018

Recent studies have shown that tuning prediction models increases prediction accuracy and that Random Forest can be used to construct prediction intervals. However, to our best knowledge, no study has investigated the need to, and the manner in which one can, tune Random Forest for optimizing prediction intervals { this paper aims to fill this gap. We explore a tuning approach that combines an effectively exhaustive search with a validation technique on a single Random Forest parameter. This paper investigates which, out of eight validation techniques, are beneficial for tuning, i.e., which automatically choose a Random Forest configuration constructing prediction intervals that are reliable and with a smaller width than the default configuration. Additionally, we present and validate three meta-validation techniques to determine which are beneficial, i.e., those which automatically chose a beneficial validation technique. This study uses data from our industrial partner (Keymind Inc.) and the Tukutuku Research Project, related to post-release defect prediction and Web application effort estimation, respectively. Results from our study indicate that: i) the default configuration is frequently unreliable, ii) most of the validation techniques, including previously successfully adopted ones such as 50/50 holdout and bootstrap, are counterproductive in most of the cases, and iii) the 75/25 holdout meta-validation technique is always beneficial; i.e., it avoids the likely counterproductive effects of validation techniques.

artificial intelligence, configuration, machine learning, (15 more...)

arXiv.org Machine Learning

1801.07194

Country:

Europe > Portugal > Braga > Braga (0.04)
Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Virginia > Falls Church (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
(2 more...)

Add feedback

Gradient Boosting in TensorFlow vs XGBoost

@machinelearnbotJan-18-2018, 19:45:33 GMT

Tensorflow 1.4 was released a few weeks ago with an implementation of Gradient Boosting, called TensorFlow Boosted Trees (TFBT). Unfortunately, the paper does not have any benchmarks, so I ran some against XGBoost. For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2006. It's probably as close to an out-of-the-box machine learning algorithm as you can get today, as it gracefully handles un-normalized or missing data, while being accurate and fast to train. The code to reproduce the results in this article is on GitHub.

artificial intelligence, machine learning, xgboost, (11 more...)

@machinelearnbot

Country: North America > United States (0.05)

Industry:

Transportation > Passenger (0.52)
Transportation > Air (0.52)
Consumer Products & Services > Travel (0.52)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

How to Install XGBoost for Python on macOS - Machine Learning Mastery

#artificialintelligenceJan-18-2018, 06:11:59 GMT

XGBoost is a library for developing very fast and accurate gradient boosting models. It is a library at the center of many winning solutions in Kaggle data science competitions. In this tutorial, you will discover how to install the XGBoost library for Python on macOS. How to Install XGBoost for Python on macOS Photo by auntjojo, some rights reserved. Note: I have used this procedure for years on a range of different macOS versions and it has not changed.

artificial intelligence, machine learning, xgboost, (8 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.39)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Learn Gradient Boosting Algorithm for better predictions (with codes in R)

@machinelearnbotJan-16-2018, 14:09:57 GMT

The accuracy of a predictive model can be boosted in two ways: Either by embracing feature engineering or by applying boosting algorithms straight away. Having participated in lots of data science competition, I've noticed that people prefer to work with boosting algorithms as it takes less time and produces similar results. There are multiple boosting algorithms like Gradient Boosting, XGBoost, AdaBoost, Gentle Boost etc. Every algorithm has its own underlying mathematics and a slight variation is observed while applying them. If you are new to this, Great! You shall be learning all these concepts in a week's time from now.

artificial intelligence, learner, machine learning, (10 more...)

@machinelearnbot

Genre: Contests & Prizes (0.36)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Gradient Boosting in TensorFlow vs XGBoost - Nicolò Valigi

@machinelearnbotJan-16-2018, 12:18:36 GMT

artificial intelligence, machine learning, xgboost, (10 more...)

@machinelearnbot

Country: North America > United States (0.05)

Industry:

Transportation > Air (0.53)
Consumer Products & Services > Travel (0.53)
Transportation > Passenger (0.33)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

slundberg/shap

#artificialintelligenceJan-14-2018, 07:16:11 GMT

SHAP (SHapley Additive exPlanations) explains the output of any machine learning model using expectations and Shapley values. SHAP unifies aspects of several previous methods [1-7] and represents the only possible consistent and locally accurate additive feature attribution method based on expectations (see SHAP paper for details). While SHAP values can explain the output of any machine learning model, we have developed a high-speed exact algorithm for ensemble tree methods (Tree SHAP paper). The above explanation shows features each contributing to push the model output from the base value (the average model output over the training dataset we passed) to the model output. Features pushing the prediction higher are shown in red, those pushing the prediction lower are in blue.

artificial intelligence, dataset, machine learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Add feedback

Cosmic String Detection with Tree-Based Machine Learning

Sadr, A. Vafaei, Farhang, M., Movahed, S. M. S., Bassett, B., Kunz, M.

arXiv.org Machine LearningJan-12-2018

We explore the use of random forest and gradient boosting, two powerful tree-based machine learning algorithms, for the detection of cosmic strings in maps of the cosmic microwave background (CMB), through their unique Gott-Kaiser-Stebbins effect on the temperature anisotropies.The information in the maps is compressed into feature vectors before being passed to the learning units. The feature vectors contain various statistical measures of processed CMB maps that boost the cosmic string detectability. Our proposed classifiers, after training, give results improved over or similar to the claimed detectability levels of the existing methods for string tension, $G\mu$. They can make $3\sigma$ detection of strings with $G\mu \gtrsim 2.1\times 10^{-10}$ for noise-free, $0.9'$-resolution CMB observations. The minimum detectable tension increases to $G\mu \gtrsim 3.0\times 10^{-8}$ for a more realistic, CMB S4-like (II) strategy, still a significant improvement over the previous results.

artificial intelligence, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1801.0414

Country:

Africa > South Africa > Western Cape > Cape Town (0.05)
Europe > Switzerland > Geneva > Geneva (0.04)
Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Data Science > Data Mining > Feature Extraction (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.55)

Add feedback

Hyperparameter Tuning the Random Forest in Python – Towards Data Science

#artificialintelligenceJan-10-2018, 21:04:05 GMT

I have included Python code in this article where it is most instructive. Full code and data to follow along can be found on the project Github page. The best way to think about hyperparameters is like the settings of an algorithm that can be adjusted to optimize performance, just as we might turn the knobs of an AM radio to get a clear signal (or your parents might have!). While model parameters are learned during training -- such as the slope and intercept in a linear regression -- hyperparameters must be set by the data scientist before training. In the case of a random forest, hyperparameters include the number of decision trees in the forest and the number of features considered by each tree when splitting a node.

artificial intelligence, decision tree learning, machine learning, (18 more...)

#artificialintelligence

Country: North America > United States > Washington > King County > Seattle (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)

Add feedback