AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

An Information-Gain-based Feature Ranking Function for XGBoost

#artificialintelligenceJun-24-2016, 03:01:08 GMT

XGBoost (short for Extreme Gradient Boosting) is a relatively new classification technique in machine learning which has won more and more popularity because of its exceptional performance in multiple competitions hosted on Kaggle.com. A lesser known benefit of using XGBoost is that the tree ensemble model can rank features for high-dimensional data sets. The official implementation of XGBoost (Python) provides only one feature scoring function called get_fscore. What it does is that, it computes feature scores by counting how many times a feature appears in the splits and rank the features according to the splits. It is simple, and it is straightforward, but I believe we should not ignore another metric which is critical to the decision tree method.

artificial intelligence, information gain, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Pruning Random Forests for Prediction on a Budget

Nan, Feng, Wang, Joseph, Saligrama, Venkatesh

arXiv.org Machine LearningJun-16-2016

We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.

artificial intelligence, machine learning, optimization problem, (20 more...)

arXiv.org Machine Learning

1606.0506

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Italy > Tuscany > Florence (0.04)
Asia > Middle East > Israel > Haifa District > Haifa (0.04)

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (0.68)
(2 more...)

Add feedback

Making Tree Ensembles Interpretable

Hara, Satoshi, Hayashi, Kohei

arXiv.org Machine LearningJun-16-2016

Tree ensembles, such as random forest and boosted trees, are renowned for their high prediction performance, whereas their interpretability is critically limited. In this paper, we propose a post processing method that improves the model interpretability of tree ensembles. After learning a complex tree ensembles in a standard way, we approximate it by a simpler model that is interpretable for human. To obtain the simpler model, we derive the EM algorithm minimizing the KL divergence from the complex ensemble. A synthetic experiment showed that a complicated tree ensemble was approximated reasonably as interpretable.

artificial intelligence, decision tree learning, machine learning, (15 more...)

arXiv.org Machine Learning

1606.0539

Country:

Asia > Middle East > Jordan (0.05)
Asia > Japan (0.05)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)

Add feedback

Ensemble Machine Learning Algorithms in Python with scikit-learn - Machine Learning Mastery

#artificialintelligenceJun-13-2016, 15:26:28 GMT

Ensembles can give you a boost in accuracy on your dataset. In this post you will discover how you can create some of the most powerful types of ensembles in Python using scikit-learn. This case study will step you through Boosting, Bagging and Majority Voting and show you how you can continue to ratchet up the accuracy of the models on your own datasets. Ensemble Machine Learning Algorithms in Python with scikit-learn Photo by The United States Army Band, some rights reserved. It assumes you are generally familiar with machine learning algorithms and ensemble methods and that you are looking for information on how to create ensembles in Python.

algorithm, artificial intelligence, machine learning, (12 more...)

#artificialintelligence

Country: North America > United States (0.55)

Genre: Instructional Material > Course Syllabus & Notes (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)

Add feedback

XGBoost workshop and meetup talk with Tianqi Chen Data Science Los Angeles

#artificialintelligenceJun-6-2016, 16:45:50 GMT

Proof of this and also because XGBoost has an easy-to-use interface from both R and Python, XGBoost has become a favorite tool in Kaggle competitions. Besides feature engineering, cross-validation and ensembling, XGBoost is a key ingredient for achieving the highest accuracy in many data science competitions and more importantly in practical applications. We were fortunate to recently host Tianqi Chen, the main author of XGBoost in a workshop and a meetup talk in Santa Monica, California. First, we started with an advanced workshop in the afternoon for which anyone could apply to participate but there were only a dozen spots available (which got us some expert users of XGBoost, but unfortunately we had to reject some good people too, sorry). This advanced workshop had 2 sessions.

artificial intelligence, machine learning, xgboost, (7 more...)

#artificialintelligence

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.40)
North America > United States > California > Los Angeles County > Santa Monica (0.26)

Genre:

Instructional Material > Course Syllabus & Notes (0.81)
Contests & Prizes (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

XGBoost explained • /r/MachineLearning

#artificialintelligenceJun-5-2016, 21:50:51 GMT

To expand: according to my naive understanding, boosted trees are basically an ensemble of decision trees which are fit sequentially so that each new tree makes up for the errors of the previously existing set of trees. The model is "boosted" by focusing new additions on correcting the residual errors of the last version of the model. The "gradient" comes in afterward as the parameters of the tree ensemble are optimized to minimize the error of the whole "base learner". I think of this as fine tuning of the boosted tree ensemble using a gradient-based optimization.

machine learning, machinelearning, social media, (4 more...)

#artificialintelligence

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Add feedback

szilard/xgboost-adv-workshop-LA

#artificialintelligenceJun-3-2016, 02:34:37 GMT

Tianqi Chen will be in Santa Monica, June 2, 2016 and besides a meetup talk in the evening (already sold out, sorry) I'm also organizing an advanced workshop in the afternoon (3:00-6:00pm) to do more advanced stuff. There will be only 10 spots for the workshop and you'll have to apply by filling out this form (Update: workshop is full.). The workshop will be a mix of Tianqi talking about more advanced stuff and participants interacting, asking questions etc. (partly hands-on, bring your laptop and your specific questions/problems/datasets). We can use this github repo (issues, PR) for setting up questions/problems/topics etc. to be discussed in the workshop, feel free to participate. Location disclosed only to the selected participants.

artificial intelligence, machine learning, szilard xgboost-adv-workshop-la, (2 more...)

#artificialintelligence

Country: North America > United States > California > Los Angeles County > Santa Monica (0.30)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

Add feedback

How to use XGBoost algorithm in R in easy steps

#artificialintelligenceJun-2-2016, 23:20:36 GMT

Did you know using XGBoost algorithm is one of the popular winning recipe of data science competitions? So, what makes it more powerful than a traditional Random Forest or Neural Network? In the last few years, predictive modeling has become much faster and accurate. I remember spending long hours on feature engineering for improving model by few decimals. A lot of that difficult work, can now be done by using better algorithms.

algorithm, artificial intelligence, machine learning, (13 more...)

#artificialintelligence

Genre: Contests & Prizes (0.36)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.77)

Add feedback

Random forest - impute or remove NA values? Which is the better approach? • /r/MachineLearning

@machinelearnbotMay-31-2016, 22:16:47 GMT

Can you reduce the parameter space at all (using PCA or something similar)? This would probably improve your results when removing the NAs. Are the NA values present in every dimension? If there are only a couple of dimensions with NAs, try to train without them and see what happens. What does your data represent, and why are there NAs? Depending on what your data corresponds to it may make more or less sense to use imputation.

better approach, decision tree learning, machinelearning, (5 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Want to Win at Kaggle? Pay Attention to Your Ensembles.

#artificialintelligenceMay-28-2016, 23:30:45 GMT

Summary: Want to win a Kaggle competition or at least get a respectable place on the leaderboard? These days it's all about ensembles and for a lot of practitioners that means reaching for random forests. Random forests have indeed been very successful but it's worth remembering that there are three different categories of ensembles and some important hyper parameters tuning issues within each Here's a brief review. The Kaggle competitions are like formula racing for data science. Winners edge out competitors at the fourth decimal place and like Formula 1 race cars, not many of us would mistake them for daily drivers.

artificial intelligence, classifier, machine learning, (19 more...)

#artificialintelligence

Industry: Leisure & Entertainment > Sports > Motorsports (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.57)

Add feedback