AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

XGBoost: Implementing the Winningest Kaggle Algorithm in Spark and Flink

@machinelearnbotMay-21-2017, 22:16:48 GMT

XGBoost is a library designed and optimized for tree boosting. Gradient boosting trees model is originally proposed by Friedman et al. By embracing multi-threads and introducing regularization, XGBoost delivers higher computational power and more accurate prediction. More than half of the winning solutions in machine learning challenges hosted at Kaggle adopt XGBoost (Incomplete list). XGBoost has provided native interfaces for C, R, python, Julia and Java users.

artificial intelligence, machine learning, xgboost, (14 more...)

@machinelearnbot

Country: North America > Canada > Quebec > Montreal (0.06)

Industry: Education (0.33)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)

Add feedback

Ensemble Machine Learning in Python: Random Forest, AdaBoost

#artificialintelligenceMay-19-2017, 09:26:05 GMT

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.

artificial intelligence, decision tree learning, reinforcement learning, (5 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.71)

Industry:

Information Technology (0.93)
Automobiles & Trucks (0.79)
Leisure & Entertainment > Games (0.57)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
(2 more...)

Add feedback

Chapter 5: Random Forest Classifier – Machine Learning 101 – Medium

@machinelearnbotMay-18-2017, 23:05:09 GMT

Lets try out RandomForestClassifier on our previous code of classifying emails into spam or ham. I have created a git repository for the data set and the sample code. Its same data set discussed in this chapter. I would suggest you to follow through the discussion and do the coding yourself. In case it fails, you can use/refer my version to understand working.

artificial intelligence, decision tree learning, machine learning 101, (4 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.45)

Add feedback

Want to Win Competitions? Pay Attention to Your Ensembles.

@machinelearnbotMay-18-2017, 14:45:07 GMT

Summary: Want to win a Kaggle competition or at least get a respectable place on the leaderboard? These days it's all about ensembles and for a lot of practitioners that means reaching for random forests. Random forests have indeed been very successful but it's worth remembering that there are three different categories of ensembles and some important hyper parameters tuning issues within each Here's a brief review. The Kaggle competitions are like formula racing for data science. Winners edge out competitors at the fourth decimal place and like Formula 1 race cars, not many of us would mistake them for daily drivers.

artificial intelligence, classifier, machine learning, (18 more...)

@machinelearnbot

Industry: Leisure & Entertainment > Sports > Motorsports (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.58)

Add feedback

To tune or not to tune the number of trees in random forest?

Probst, Philipp, Boulesteix, Anne-Laure

arXiv.org Machine LearningMay-16-2017

The number of trees T in the random forest (RF) algorithm for supervised learning has to be set by the user. It is controversial whether T should simply be set to the largest computationally manageable value or whether a smaller T may in some cases be better. While the principle underlying bagging is that "more trees are better", in practice the classification error rate sometimes reaches a minimum before increasing again for increasing number of trees. The goal of this paper is four-fold: (i) providing theoretical results showing that the expected error rate may be a non-monotonous function of the number of trees and explaining under which circumstances this happens; (ii) providing theoretical results showing that such non-monotonous patterns cannot be observed for other performance measures such as the Brier score and the logarithmic loss (for classification) and the mean squared error (for regression); (iii) illustrating the extent of the problem through an application to a large number (n = 306) of datasets from the public database OpenML; (iv) finally arguing in favor of setting it to a computationally feasible large number, depending on convergence properties of the desired performance measure.

artificial intelligence, decision tree learning, machine learning, (19 more...)

arXiv.org Machine Learning

1705.05654

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
North America > United States > California > Monterey County > Pacific Grove (0.04)
Europe > Albania > Durrës County > Durrës (0.04)
Asia > India (0.04)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.61)

Add feedback

Forecasting Demand with Limited Information Using Gradient Tree Boosting

Chang, Stephan (Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS)) | Meneguzzi, Felipe (Pontifícia Universidade Católica do Rio Grande do Sul (PUCRS))

AAAI ConferencesMay-16-2017

Demand forecasting is an important challenge for industries seeking to optimize service quality and expenditures. Generating accurate forecasts is difficult because it depends on the quality of the data available to train predictive models, as well as on the model chosen for the task. We evaluate the approach on two datasets of varying complexity and compare the results with three machine learning algorithms. Results show our approach can outperform these approaches.

forecasting demand, information

AAAI Conferences

The Thirtieth International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)

Add feedback

Extreme Gradient Boosting and Preprocessing in Machine Learning – Addendum to predicting flu outcome with R

#artificialintelligenceMay-11-2017, 10:00:14 GMT

In last week's post I explored whether machine learning models can be applied to predict flu deaths from the 2013 outbreak of influenza A H7N9 in China. There, I compared random forests, elastic-net regularized generalized linear models, k-nearest neighbors, penalized discriminant analysis, stabilized linear discriminant analysis, nearest shrunken centroids, single C5.0 tree and partial least squares. Extreme gradient boosting (XGBoost) is a faster and improved implementation of gradient boosting for supervised learning and has recently been very successfully applied in Kaggle competitions. Because I've heard XGBoost's praise being sung everywhere lately, I wanted to get my feet wet with it too. So this week I want to compare the prediction success of gradient boosting with the same dataset.

artificial intelligence, gradient, machine learning, (17 more...)

#artificialintelligence

Country: Asia > China (0.25)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)

Add feedback

majacaci00/data-science-projects

#artificialintelligenceMay-3-2017, 00:25:25 GMT

This is a sample of the data science projects I have been working on my own. The Zika Project, is an extensive analysis of microcephaly cases related to Zika in Brazil. This case study tries to explain how weather conditions from January 2015 to May 2016, projected 2015 and 2016 total population of men and women within a reproductive age (15- 44), prevalence of microcephaly cases, growth rate of microcephaly, and sanitation and demographic characteristics of the 27 Brazilian states have influenced the increase of microcephaly confirmed reported cases linked to zika from February 2016 to May 2016. To describe and report variables/features with greater emphasis on microcephaly, the study uses linear regression, lasso and ridge regression, regression trees, random forest regression and gradient boosting regressor. This is analysis unveils what factors other than elevation and runners split's strategy are better predictors of finishing within the top 15 male and female runners of the 2016 Boston Marathon In this short analysis explains, I used a expanded version of the mincer equation and find that marital status, gender, student's province of residence and country where student pursued his/her postgraduate studies are complementary features to explain the return of income/investement.

decision tree learning, machine learning, microcephaly, (4 more...)

#artificialintelligence

Country:

South America > Brazil (0.28)
North America > United States > California > San Francisco County > San Francisco (0.08)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Epidemiology (0.76)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.61)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.61)

Add feedback

Explaining the Success of AdaBoost and Random Forests as Interpolating Classifiers

Wyner, Abraham J., Olson, Matthew, Bleich, Justin, Mease, David

arXiv.org Machine LearningApr-29-2017

There is a large literature explaining why AdaBoost is a successful classifier. The literature on AdaBoost focuses on classifier margins and boosting's interpretation as the optimization of an exponential likelihood function. These existing explanations, however, have been pointed out to be incomplete. A random forest is another popular ensemble method for which there is substantially less explanation in the literature. We introduce a novel perspective on AdaBoost and random forests that proposes that the two algorithms work for similar reasons. While both classifiers achieve similar predictive accuracy, random forests cannot be conceived as a direct optimization procedure. Rather, random forests is a self-averaging, interpolating algorithm which creates what we denote as a "spikey-smooth" classifier, and we view AdaBoost in the same light. We conjecture that both AdaBoost and random forests succeed because of this mechanism. We provide a number of examples and some theoretical justification to support this explanation. In the process, we question the conventional wisdom that suggests that boosting algorithms for classification require regularization or early stopping and should be limited to low complexity classes of learners, such as decision stumps. We conclude that boosting should be used like random forests: with large decision trees and without direct regularization or early stopping.

adaboost, classifier, random forest, (14 more...)

arXiv.org Machine Learning

1504.07676

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
North America > United States > Wisconsin (0.04)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback

Artificial intelligence can accurately predict future heart disease and strokes, study finds

#artificialintelligenceApr-26-2017, 14:35:37 GMT

Computers that can teach themselves from routine clinical data are potentially better at predicting cardiovascular risk than current standard medical risk models, according to new research at the University of Nottingham. The team of primary care researchers and computer scientists compared a set of standard guidelines from the American College of Cardiology (ACC) with four'machine-learning' algorithms – these analyse large amounts of data and self-learn patterns within the data to make predictions on future events – in this case, a patient's future risk having of heart disease or a stroke. The results, published in the online journal PLOS ONE, showed that the self-teaching'artificially intelligent' tools were significantly more accurate in predicting cardiovascular disease than the established algorithm. In computer science, the AI algorithms that were used are called'random forest', 'logistic regression', 'gradient boosting' and'neural networks'. Dr Stephen Weng, from the university's NIHR School for Primary Care Research, said: "Cardiovascular disease is the leading cause of illness and death worldwide. Our study shows that artificial intelligence could significantly help in the fight against it by improving the number of patients accurately identified as being at high risk and allowing for early intervention by doctors to prevent serious events like cardiac arrest and stroke. "Current standard prediction models like the ACC are based on eight risk factors including age, cholesterol level and blood pressure but are too simplistic to account for other factors like medications, multiple disease conditions, and other non-traditional biomarkers.

algorithm, future heart disease and stroke, predict future heart disease, (10 more...)

#artificialintelligence

Country: Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.26)

Genre:

Research Report > Experimental Study (0.61)
Research Report > New Finding (0.53)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.57)

Add feedback