AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

When Does Deep Learning Work Better Than SVMs or Random Forests?

@machinelearnbotJun-3-2017, 15:25:13 GMT

If we tackle a supervised learning problem, my advice is to start with the simplest hypothesis space first. I.e., try a linear model such as logistic regression. If this doesn't work "well" (i.e., it doesn't meet our expectation or performance criterion that we defined earlier), I would move on to the next experiment. I would say that random forests are probably THE "worry-free" approach - if such a thing exists in ML: There are no real hyperparameters to tune (maybe except for the number of trees; typically, the more trees we have the better). On the contrary, there are a lot of knobs to be turned in SVMs: Choosing the "right" kernel, regularization penalties, the slack variable, ... Both random forests and SVMs are non-parametric models (i.e., the complexity grows as the number of training samples increases).

artificial intelligence, machine learning, svm, (13 more...)

@machinelearnbot

Country: North America > United States > Michigan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.62)

Add feedback

Sign Up for Data Science Central

@machinelearnbotJun-2-2017, 08:30:18 GMT

Cookies may not be enabled in your browser. You will need to enable them in order to continue. Welcome to Data Science Central.

artificial intelligence, decision tree learning, machine learning

@machinelearnbot

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

Random Forests explained intuitively

@machinelearnbotJun-2-2017, 03:20:15 GMT

Say, you appeared for the position of Statistical analyst at WalmartLabs. Now like most of the companies, you don't just have one round of interview. You have multiple rounds of interviews. Each one of these interviews is chaired by independent panels. Generally, even the questions asked in these interviews differ from each other.

decision tree learning, interview, machine learning, (2 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.47)

Add feedback

Building Trust in Machine Learning Models (using LIME in Python)

#artificialintelligenceJun-2-2017, 00:15:53 GMT

The value is not in software, the value is in data, and this is really important for every single company, that they understand what data they've got. More and more companies are now aware of the power of data. Machine Learning models are increasing in popularity and are now being used to solve a wide variety of business problems using data. Having said that, it is also true that there is always a trade-off between accuracy of models & its interpretability. In general, if accuracy has to be improved, data scientists have to resort to using complicated algorithms like Bagging, Boosting, Random Forests etc. which are "Blackbox" methods.

algorithm, artificial intelligence, machine learning, (18 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.35)

Add feedback

How the random forest algorithm works in machine learning 7wData

#artificialintelligenceJun-1-2017, 19:15:28 GMT

You are going to learn the most popular classification algorithm. Which is the Random forest algorithm. As a motivation to go further I am going to give you one of the best advantages of random forest. The Same algorithm both for classification and regression, You mind be thinking I am kidding. But the truth is, Yes we can use the same random forest algorithm both for classification and regression.

algorithm, artificial intelligence, machine learning, (15 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Optimization of Tree Ensembles

Mišić, Velibor V.

arXiv.org Machine LearningMay-30-2017

Tree ensemble models such as random forests and boosted trees are among the most widely used and practically successful predictive models in applied machine learning and business analytics. Although such models have been used to make predictions based on exogenous, uncontrollable independent variables, they are increasingly being used to make predictions where the independent variables are controllable and are also decision variables. In this paper, we study the problem of tree ensemble optimization: given a tree ensemble that predicts some dependent variable using controllable independent variables, how should we set these variables so as to maximize the predicted value? We formulate the problem as a mixed-integer optimization problem. We theoretically examine the strength of our formulation, provide a hierarchy of approximate formulations with bounds on approximation quality and exploit the structure of the problem to develop two large-scale solution methods, one based on Benders decomposition and one based on iteratively generating tree split constraints. We test our methodology on real data sets, including two case studies in drug design and customized pricing, and show that our methodology can efficiently solve large-scale instances to near or full optimality, and outperforms solutions obtained by heuristic approaches. In our drug design case, we show how our approach can identify compounds that efficiently trade-off predicted performance and novelty with respect to existing, known compounds. In our customized pricing case, we show how our approach can efficiently determine optimal store-level prices under a random forest model that delivers excellent predictive accuracy.

constraint, data mining, machine learning, (16 more...)

arXiv.org Machine Learning

1705.10883

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > New York (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Consumer Products & Services (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

Targeted Learning with Daily EHR Data

Sofrygin, Oleg, Zhu, Zheng, Schmittdiel, Julie A, Adams, Alyce S., Grant, Richard W., van der Laan, Mark J., Neugebauer, Romain

arXiv.org Machine LearningMay-27-2017

Electronic health records (EHR) data provide a cost and time-effective opportunity to conduct cohort studies of the effects of multiple time-point interventions in the diverse patient population found in real-world clinical settings. Because the computational cost of analyzing EHR data at daily (or more granular) scale can be quite high, a pragmatic approach has been to partition the follow-up into coarser intervals of pre-specified length. Current guidelines suggest employing a 'small' interval, but the feasibility and practical impact of this recommendation has not been evaluated and no formal methodology to inform this choice has been developed. We start filling these gaps by leveraging large-scale EHR data from a diabetes study to develop and illustrate a fast and scalable targeted learning approach that allows to follow the current recommendation and study its practical impact on inference. More specifically, we map daily EHR data into four analytic datasets using 90, 30, 15 and 5-day intervals. We apply a semi-parametric and doubly robust estimation approach, the longitudinal TMLE, to estimate the causal effects of four dynamic treatment rules with each dataset, and compare the resulting inferences. To overcome the computational challenges presented by the size of these data, we propose a novel TMLE implementation, the 'long-format TMLE', and rely on the latest advances in scalable data-adaptive machine-learning software, xgboost and h2o, for estimation of the TMLE nuisance parameters.

artificial intelligence, estimation, machine learning, (19 more...)

arXiv.org Machine Learning

1705.09874

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Alameda County > Oakland (0.14)
North America > United States > Hawaii (0.04)
North America > United States > Colorado (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Therapeutic Area > Endocrinology > Diabetes (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)

Add feedback

GPU Accelerated XGBoost

#artificialintelligenceMay-26-2017, 21:55:17 GMT

He is also the main author of H2O's Deep Learning. Before joining H2O, Arno was a founding Senior MTS at Skytree where he designed and implemented high-performance machine learning algorithms. He has over a decade of experience in HPC with C /MPI and had access to the world's largest supercomputers as a Staff Scientist at SLAC National Accelerator Laboratory where he participated in US DOE scientific computing initiatives and collaborated with CERN on next-generation particle accelerators. Arno holds a PhD and Masters summa cum laude in Physics from ETH Zurich, Switzerland. He has authored dozens of scientific papers and is a sought-after conference speaker.

deep learning, gpu accelerated xgboost, machine learning, (2 more...)

#artificialintelligence

Country: Europe > Switzerland > Zürich > Zürich (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

Levvel Blog - Machine Learning Part Two--Running a Machine Learning Data Store on Redis Labs

#artificialintelligenceMay-26-2017, 06:54:50 GMT

Editor's note: This is the second post in a two-part series about machine learning. In part one, we discussed how to get started with machine learning: define, benchmark, and deploy. Managing large, pre-trained predictive models across an organization and ensuring the same version is on production can be a challenge with the rapid pace of changes in the AI/machine learning space. Here, we have an approach that demonstrates how to automate building, storing, and deploying predictive models from a Remote Machine Learning Data Store hosted on Redis Labs. This approach is focused on showing how DevOps CI/CD artifact pipelines can be used to build and manage machine learning model artifacts with Jupyter IPython notebooks, accompanying command line automation versions, and administration tools to help manage artifacts across a team.

artificial intelligence, dataset, machine learning, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.31)

Add feedback

Differentiating between AI, machine learning and deep learning

#artificialintelligenceMay-24-2017, 20:50:04 GMT

Machine learning is well-suited for problem domains typically found in the enterprise, like making predictions with supervised learning methods (e.g. Deep learning is an area of machine learning that has achieved significant progress in certain application areas that include pattern recognition, image classification, natural language processing (NLP), autonomous driving, and so on. Machine learning techniques like random forests and gradient boosting often perform better in the enterprise problem space than deep learning.

artificial intelligence, inductive learning, machine learning and deep learning, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.76)

Add feedback