AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Choosing features for random forests algorithm

@machinelearnbotApr-1-2016, 14:35:14 GMT

There are many ways to choose features with given data, and it is always a challenge to pick up the ones with which a particular algorithm will work better. Here I will consider data from monitoring performance of physical exercises with wearable accelerometers, for example, wrist bands. The data for this project come from this source: http://groupware.les.inf.puc-rio.br/har. In this project, researchers used data from accelerometers on the belt, forearm, arm, and dumbbell of few participants. They were asked to perform barbell lifts correctly, marked as "A", and incorrectly with four typical mistakes, marked as "B", "C", "D" and "E".

artificial intelligence, decision tree learning, random forest algorithm, (3 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python

#artificialintelligenceMar-31-2016, 19:43:07 GMT

If you have been using GBM as a'black box' till now, may be it's time for you to open it and see, how it actually works! This article is inspired by Owen Zhang's (Chief Product Officer at DataRobot and Kaggle Rank 3) approach shared at NYC Data Science Academy. He delivered a 2 hours talk and I intend to condense it and present the most precious nuggets here. Boosting algorithms play a crucial role in dealing with bias variance trade-off. Unlike bagging algorithms, which only controls for high variance in a model, boosting controls both the aspects (bias & variance), and is considered to be more effective.

artificial intelligence, machine learning, optimum value, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Add feedback

The Shape of the Trees in Gradient Boosting Machines

#artificialintelligenceMar-29-2016, 13:12:15 GMT

Our CEO and founder, Dr. Dan Steinberg recently wrote about gradient boosting machines. Gradient boosting machines are a powerful machine learning technique, and have been deployed with great success over the years in Kaggle competitions. However, specifics of the construction and core ideas of gradient boosting machines can remain a bit murky. For more a more detailed look at the shapes and sizes of the trees formed in gradient boosting machines, read the discussion on Dr. Steinberg's blog:

artificial intelligence, gradient, machine learning

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Telstra Network Disruption, Winner's Interview: 1st place, Mario Filho

#artificialintelligenceMar-27-2016, 21:19:17 GMT

Telstra Network Disruptions challenged Kagglers to predict the severity of service disruptions on their network. Using a dataset of features from their service logs, participants were tasked with predicting if a disruption was a momentary glitch or a total interruption of connectivity. Mario Filho, a self-taught data scientist, took first place in his first "solo win". In this blog, he shares a high-level view of his approach. My background in machine learning is completely "self-taught". It all began in 2012 when I decided to learn Calculus on my own through the videos from a MIT class.

artificial intelligence, machine learning, mario filho, (10 more...)

#artificialintelligence

Genre:

Personal > Interview (0.40)
Instructional Material (0.33)

Industry: Education (0.39)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.35)

Add feedback

dmlc/xgboost

#artificialintelligenceMar-25-2016, 01:15:45 GMT

This page contains a curated list of examples, tutorials, blogs about XGBoost usecases. It is inspired by awesome-MXNet, awesome-php and awesome-machine-learning. Please send a pull request if you find things that belongs to here. This is a list of short codes introducing different functionalities of xgboost packages. Most of examples in this section are based on CLI or python version.

artificial intelligence, dmlc xgboost, machine learning, (2 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.31)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Lost in a random forest: Using Big Data to study rare events News & Analysis

#artificialintelligenceMar-24-2016, 21:45:43 GMT

Sudden, broad-scale shifts in public opinion about social problems are relatively rare. Until recently, social scientists were forced to conduct post-hoc case studies of such unusual events that ignore the broader universe of possible shifts in public opinion that do not materialize. The vast amount of data that has recently become available via social media sites such as Facebook and Twitter--as well as the mass-digitization of qualitative archives provide an unprecedented opportunity for scholars to avoid such selection on the dependent variable. Yet the sheer scale of these new data creates a new set of methodological challenges. Conventional linear models, for example, minimize the influence of rare events as "outliers"--especially within analyses of large samples.

artificial intelligence, data mining, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
(2 more...)

Add feedback

Walmart Kaggle: Trip Type Classification

@machinelearnbotMar-23-2016, 09:30:53 GMT

They took the NYC Data Science Academy 12-week full-time data science bootcamp program from Sep. 23 to Dec. 18, 2015. The post was based on their fourth in-class project (due after the 8th week of the program). Walmart uses trip type classification to segment its shoppers and their store visits to better improve the shopping experience. Walmart's trip types are created from a combination of existing customer insights and purchase history data. The purpose of the Kaggle competition is to use only the purchase data provided to derive Walmart's classification labels.

machine learning, natural language, text classification, (16 more...)

@machinelearnbot

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.43)

Add feedback

XGboost Archives - The Big Data Blog

#artificialintelligenceMar-22-2016, 14:26:28 GMT

We learn more from code, and from great code. Not necessarily always the 1st ranking solution, because we also learn what makes a stellar and just a good solution. I will post solutions I came upon so we can all learn to become better! I collected the following source code and interesting discussions from the Kaggle held competitions for learning purposes. Not all competitions are listed because I am only manually collecting them, also some competitions are not listed due to no one sharing.

artificial intelligence, data mining, machine learning, (3 more...)

#artificialintelligence

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Add feedback

XGBoost4J: Portable Distributed XGBoost in Spark, Flink and Dataflow

#artificialintelligenceMar-21-2016, 23:40:50 GMT

XGBoost is a library designed and optimized for tree boosting. Gradient boosting trees model is originally proposed by Friedman et al. By embracing multi-threads and introducing regularization, XGBoost delivers higher computational power and more accurate prediction. More than half of the winning solutions in machine learning challenges hosted at Kaggle adopt XGBoost (Incomplete list). XGBoost has provided native interfaces for C, R, python, Julia and Java users.

artificial intelligence, machine learning, xgboost, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

XGBoost: A Scalable Tree Boosting System

#artificialintelligenceMar-21-2016, 14:55:40 GMT

"Abstract Tree boosting is a highly effective and widely used machine learning method. In this paper, we describe a scalable end-to-end tree boosting system called XGBoost, which is used widely by data scientists to achieve state-of-the-art results on many machine learning challenges. We propose a novel sparsity-aware algorithm for sparse data and weighted quantile sketch for approximate tree learning. More importantly, we provide insights on cache access patterns, data compression and sharding to build a scalable tree boosting system.

artificial intelligence, machine learning, xgboost

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback