AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python)

#artificialintelligenceAug-29-2017, 12:35:35 GMT

Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well. They are adaptable at solving any kind of problem at hand (classification or regression). Methods like decision trees, random forest, gradient boosting are being popularly used in all kinds of data science problems. Hence, for every analyst (fresher also), it's important to learn these algorithms and use them for modeling. This tutorial is meant to help beginners learn tree based modeling from scratch. After the successful completion of this tutorial, one is expected to become proficient at using tree based algorithms and build predictive models. Note: This tutorial requires no prior knowledge of machine learning.

algorithm, artificial intelligence, machine learning, (18 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

Add feedback

Ensemble Learning to Improve Machine Learning Results

#artificialintelligenceAug-22-2017, 20:50:29 GMT

Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking). Most ensemble methods use a single base learning algorithm to produce homogeneous base learners, i.e. learners of the same type, leading to homogeneous ensembles. There are also some methods that use heterogeneous learners, i.e. learners of different types, leading to heterogeneous ensembles. In order for ensemble methods to be more accurate than any of its individual members, the base learners have to be as accurate as possible and as diverse as possible. Bagging stands for bootstrap aggregation.

artificial intelligence, learner, machine learning, (9 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.41)
Information Technology > Artificial Intelligence > Machine Learning > Memory-Based Learning (0.40)

Add feedback

Quant Trading using Machine Learning - Udemy

@machinelearnbotAug-22-2017, 07:20:12 GMT

Source code (with copious amounts of comments) is attached as a resource with all the code-alongs. Prerequisites: Working knowledge of Python is necessary if you want to run the source code that is provided. Basic knowledge of machine learning, especially ML classification techniques, would be helpful but it's not mandatory. Taught by a Stanford-educated, ex-Googler and an IIT, IIM - educated ex-Flipkart lead analyst. This team has decades of practical experience in quant trading, analytics and e-commerce.

artificial intelligence, machine learning, quant trading, (11 more...)

@machinelearnbot

Genre:

Instructional Material > Online (0.40)
Instructional Material > Course Syllabus & Notes (0.40)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (0.40)
Education > Educational Setting > Online (0.40)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.37)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)

Add feedback

Lessons Learned From Benchmarking Fast Machine Learning Algorithms

@machinelearnbotAug-20-2017, 16:00:07 GMT

Boosted decision trees are responsible for more than half of the winning solutions in machine learning challenges hosted at Kaggle, according to KDnuggets. In addition to superior performance, these algorithms have practical appeal as they require minimal tuning. In this post, we evaluate two popular tree boosting software packages: XGBoost and LightGBM, including their GPU implementations. All our code is open-source and can be found in this repo. We will explain the algorithms behind these libraries and evaluate them across different datasets.

artificial intelligence, dataset, machine learning, (14 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.74)

Add feedback

Fast Gaussian Process Regression for Big Data

Das, Sourish, Roy, Sasanka, Sambasivan, Rajiv

arXiv.org Machine LearningAug-18-2017

Gaussian Processes are widely used for regression tasks. A known limitation in the application of Gaussian Processes to regression tasks is that the computation of the solution requires performing a matrix inversion. The solution also requires the storage of a large matrix in memory. These factors restrict the application of Gaussian Process regression to small and moderate size data sets. We present an algorithm that combines estimates from models developed using subsets of the data obtained in a manner similar to the bootstrap. The sample size is a critical parameter for this algorithm. Guidelines for reasonable choices of algorithm parameters, based on detailed experimental study, are provided. Various techniques have been proposed to scale Gaussian Processes to large scale regression tasks. The most appropriate choice depends on the problem context. The proposed method is most appropriate for problems where an additive model works well and the response depends on a small number of features. The minimax rate of convergence for such problems is attractive and we can build effective models with a small subset of the data. The Stochastic Variational Gaussian Process and the Sparse Gaussian Process are also appropriate choices for such problems. These methods pick a subset of data based on theoretical considerations. The proposed algorithm uses bagging and random sampling. Results from experiments conducted as part of this study indicate that the algorithm presented in this work can be as effective as these methods. Keywords: Big Data, Gaussian Process, Regression 2010 MSC: 00-01, 99-00 1. Introduction Gaussian Processes (GP) are attractive tools to perform supervised learning tasks on complex datasets on which traditional parametric methods may not be effective. They are also easier to use in comparison to alternatives like neural networks ([1]).

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Machine Learning

1509.05142

Country:

North America > United States > California (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > India > Tamil Nadu > Chennai (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.88)

Industry:

Transportation (0.69)
Energy (0.47)
Government (0.46)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Machine Learning: Machine Learning for Beginners. Can machines really learn like humans? All about Artificial Intelligence (A.I), Deep Learning and Digital … Random Forests, Computer Science)

#artificialintelligenceAug-17-2017, 08:55:32 GMT

Today only, get this amazing ebook for just $0.99. Machine learning is currently one of the most talked about concepts in the world of technology and computers. A highly promising topic, machine learning is also quite controversial among people who are not aware of its nature and benefits. Therefore, to do away with such myths and apprehensions, it has become essential for everyone to find out and read about the concept. This book will help you with this mission, as you will find all the required and relevant data regarding machine learning gathered in one single text.

artificial intelligence, decision tree learning, machine learning, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.40)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.40)

Add feedback

When Does Deep Learning Work Better Than SVMs or Random Forests?

@machinelearnbotAug-7-2017, 20:30:20 GMT

Guest blog by Sebastian Raschka, originally posted here. If we tackle a supervised learning problem, my advice is to start with the simplest hypothesis space first. I.e., try a linear model such as logistic regression. If this doesn't work "well" (i.e., it doesn't meet our expectation or performance criterion that we defined earlier), I would move on to the next experiment. I would say that random forests are probably THE "worry-free" approach - if such a thing exists in ML: There are no real hyperparameters to tune (maybe except for the number of trees; typically, the more trees we have the better).

artificial intelligence, machine learning, svm, (16 more...)

@machinelearnbot

Country: North America > United States > Michigan (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.73)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

[P] Is XGBoost w/ iterating undersampling doable? • r/MachineLearning

@machinelearnbotAug-4-2017, 19:05:08 GMT

I know this might sound like a "google this for me question" but bare with me (I googled it). I'm working with a highly imbalanced data set where the minority class accounts for 1.5% of the total set. This leads to poor predictive performance by most models when nothing is done to address the problem because most algorithms will minimize cost on the majority class, to the detriment of the minority class, when training so as to decrease overall cost. So far I've tried out ANNs,RFs,XGBs, and SVMs and have found that XGB and RF outperform the others in this particular problem, so the remaining post will be about RF and XGB. I've tried penalizing classification on the minority class much more than the majority class to try to fix the imbalance on an algorithmic level but I've found undersampling and then training on the resulting data set to be more effective in my case.

machine learning, machinelearning, social media, (5 more...)

@machinelearnbot

Industry: Media > News (0.40)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods

Farrelly, Colleen M.

arXiv.org Machine LearningJul-29-2017

Very few K-nearest-neighbor (KNN) ensembles exist, despite the efficacy of this approach in regression, classification, and outlier detection. Those that do exist focus on bagging features, rather than varying k or bagging observations; it is unknown whether varying k or bagging observations can improve prediction. Given recent studies from topological data analysis, varying k may function like multiscale topological methods, providing stability and better prediction, as well as increased ensemble diversity. This paper explores 7 KNN ensemble algorithms combining bagged features, bagged observations, and varied k to understand how each of these contribute to model fit. Specifically, these algorithms are tested on Tweedie regression problems through simulations and 6 real datasets; results are compared to state-of-the-art machine learning models including extreme learning machines, random forest, boosted regression, and Morse-Smale regression. Results on simulations suggest gains from varying k above and beyond bagging features or samples, as well as the robustness of KNN ensembles to the curse of dimensionality. KNN regression ensembles perform favorably against state-of-the-art algorithms and dramatically improve performance over KNN regression. Further, real dataset results suggest varying k is a good strategy in general (particularly for difficult Tweedie regression problems) and that KNN regression ensembles often outperform state-of-the-art methods. These results for k-varying ensembles echo recent theoretical results in topological data analysis, where multidimensional filter functions and multiscale coverings provide stability and performance gains over single-dimensional filters and single-scale covering. This opens up the possibility of leveraging multiscale neighborhoods and multiple measures of local geometry in ensemble methods.

artificial intelligence, ensemble, machine learning, (18 more...)

arXiv.org Machine Learning

1708.02122

Country:

Oceania > Australia (0.04)
North America > United States > New York (0.04)
North America > United States > Iowa (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry: Health & Medicine > Therapeutic Area > Psychiatry/Psychology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit

Rubin, Jonathan, Potes, Cristhian, Xu-Wilson, Minnan, Dong, Junzi, Rahman, Asif, Nguyen, Hiep, Moromisato, David

arXiv.org Machine LearningJul-16-2017

Our work focuses on the problem of predicting the transfer of pediatric patients from the general ward of a hospital to the pediatric intensive care unit. Using data collected over 5.5 years from the electronic health records of two medical facilities, we develop classifiers based on adaptive boosting and gradient tree boosting. We further combine these learned classifiers into an ensemble model and compare its performance to a modified pediatric early warning score (PEWS) baseline that relies on expert defined guidelines. To gauge model generalizability, we perform an inter-facility evaluation where we train our algorithm on data from one facility and perform evaluation on a hidden test dataset from a separate facility. We show that improvements are witnessed over the PEWS baseline in accuracy (0.77 vs. 0.69), sensitivity (0.80 vs. 0.68), specificity (0.74 vs. 0.70) and AUROC (0.85 vs. 0.73).

artificial intelligence, classifier, machine learning, (15 more...)

arXiv.org Machine Learning

1707.04958

Country:

North America > United States > Arizona > Maricopa County > Mesa (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Pediatrics/Neonatology (1.00)
Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.87)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.96)

Add feedback