AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Random Forests for Big Data

Genuer, Robin, Poggi, Jean-Michel, Tuleau-Malot, Christine, Villa-Vialaneix, Nathalie

arXiv.org Machine LearningMar-22-2017

Big Data is one of the major challenges of statistical science and has numerous consequences from algorithmic and theoretical viewpoints. Big Data always involve massive data but they also often include online data and data heterogeneity. Recently some statistical methods have been adapted to process Big Data, like linear regression models, clustering methods and bootstrapping schemes. Based on decision trees combined with aggregation and bootstrap ideas, random forests were introduced by Breiman in 2001. They are a powerful nonparametric statistical method allowing to consider in a single and versatile framework regression problems, as well as two-class and multi-class classification problems. Focusing on classification problems, this paper proposes a selective review of available proposals that deal with scaling random forests to Big Data problems. These proposals rely on parallel environments or on online adaptations of random forests. We also describe how related quantities -- such as out-of-bag error and variable importance -- are addressed in these methods. Then, we formulate various remarks for random forests in the Big Data context. Finally, we experiment five variants on two massive datasets (15 and 120 millions of observations), a simulated one as well as real world data. One variant relies on subsampling while three others are related to parallel implementations of random forests and involve either various adaptations of bootstrap to Big Data or to "divide-and-conquer" approaches. The fifth variant relates on online learning of random forests. These numerical experiments lead to highlight the relative performance of the different variants, as well as some of their limitations.

artificial intelligence, data mining, machine learning, (19 more...)

arXiv.org Machine Learning

1511.08327

Country:

Europe > Austria > Vienna (0.14)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
(7 more...)

Genre: Research Report (1.00)

Industry: Education (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
(2 more...)

Add feedback

Winning Tips on Machine Learning Competitions by Kazanova, Current Kaggle #3

#artificialintelligenceMar-17-2017, 11:25:30 GMT

No matter how many books you read, tutorials you finish or problems you solve, there will always be a data set you might come across where you get clueless. Specially, when you are in your early days of Machine Learning. In this blog post, you'll learn some essential tips on building machine learning models which most people learn with experience. These tips were shared by Marios Michailidis (a.k.a Kazanova), Kaggle Grandmaster, Current Rank #3 in a webinar happened on 5th March 2016. The key to succeeding in competitions is perseverance. Marios said, 'I won my first competition (Acquired valued shoppers challenge) and entered kaggle's top 20 after a year of continued participation on 4 GB RAM laptop (i3)'. Were you planning to give up? While reading Q & As, if you have any questions, please feel free to drop them in comments!

artificial intelligence, competition, machine learning, (13 more...)

#artificialintelligence

Country: North America > United States > Utah (0.04)

Genre: Personal > Interview (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.47)

Add feedback

A Simple XGBoost Tutorial Using the Iris Dataset

#artificialintelligenceMar-9-2017, 14:20:52 GMT

I had the opportunity to start using xgboost machine learning algorithm, it is fast and shows good results. Here I will be using multiclass prediction with the iris dataset from scikit-learn. In order to work with the data, I need to install various scientific libraries for python. The best way I have found is to use Anaconda. It simply installs all the libs and helps to install new ones.

artificial intelligence, machine learning, simple xgboost tutorial, (5 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.80)

Add feedback

Tuning hyperparams fast with Hyperband - FastML

#artificialintelligenceMar-6-2017, 03:15:41 GMT

Hyperband is a relatively new method for tuning iterative algorithms. It performs random sampling and attempts to gain the edge by using time spent optimizing in the best way. We explain a few things that were not clear to us right away, and try the algorithm in practice. Candidates for tuning with Hyperband include all the SGD derivatives - meaning the whole deep learning - and tree ensembles: gradient boosting, and perhaps to a lesser extent, random forest and extremely randomized trees. To quantify this idea, we compare to random run at twice the speed which beats the two Bayesian Optimization methods, i.e., running random search for twice as long yields superior results.

artificial intelligence, iteration, machine learning, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.71)

Add feedback

Indoor Localization by Fusing a Group of Fingerprints Based on Random Forests

Guo, Xiansheng, Ansari, Nirwan, Li, Huiyong

arXiv.org Machine LearningMar-6-2017

Indoor localization based on SIngle Of Fingerprint (SIOF) is rather susceptible to the changing environment, multipath, and non-line-of-sight (NLOS) propagation. Building SIOF is also a very time-consuming process. Recently, we first proposed a GrOup Of Fingerprints (GOOF) to improve the localization accuracy and reduce the burden of building fingerprints. However, the main drawback is the timeliness. In this paper, we propose a novel localization framework by Fusing A Group Of fingerprinTs (FAGOT) based on random forests. In the offline phase, we first build a GOOF from different transformations of the received signals of multiple antennas. Then, we design multiple GOOF strong classifiers based on Random Forests (GOOF-RF) by training each fingerprint in the GOOF. In the online phase, we input the corresponding transformations of the real measurements into these strong classifiers to obtain multiple independent decisions. Finally, we propose a Sliding Window aIded Mode-based (SWIM) fusion algorithm to balance the localization accuracy and time. Our proposed approaches can work better in an unknown indoor scenario. The burden of building fingerprints can also be reduced drastically. We demonstrate the performance of our algorithms through simulations and real experimental data using two Universal Software Radio Peripheral (USRP) platforms.

artificial intelligence, decision tree learning, machine learning, (20 more...)

arXiv.org Machine Learning

1703.02185

Country: Asia > China (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine > Health Care Technology > Telehealth (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.84)

Add feedback

Day05: Recognition with random forest

#artificialintelligenceMar-2-2017, 07:51:22 GMT

Day05 brings another Kaggle competition, "Digit Recognizer". The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is. The Jupyter Notebook for this little project is found here. Today, I think I'll use a random forest classifier. A random forest classifier is like an election where the outcome with the most votes (from the ensemble of decision trees) is the predicted classification.

artificial intelligence, machine learning, random forest, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)

Add feedback

Tuning hyperparams fast with Hyperband

#artificialintelligenceFeb-27-2017, 16:15:27 GMT

Hyperband is a method for tuning iterative algorithms. It uses random sampling and attempts to gain the edge by using time spent optimizing in the best way. We explain a few things that were not clear to us right away, and try the algorithm in practice. Come back later for la version final. For us, the name conjures an idea of some fancy topological construct.

artificial intelligence, iteration, machine learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.31)

Add feedback

rxNeuralNet vs. xgBoost vs. H2O

#artificialintelligenceFeb-20-2017, 22:50:18 GMT

Recently, I did a session at local user group in Ljubljana, Slovenija, where I introduced the new algorithms that are available with MicrosoftML package for Microsoft R Server 9.0.3. For dataset, I have used two from (still currently) running sessions from Kaggle. In the last part, I did image detection and prediction of MNIST dataset and compared the performance and accuracy between. MNIST Handwritten digit database is available here. Starting off with rxNeuralNet, we have to build a NET# model or Neural network to work it's way.

accuracy, artificial intelligence, machine learning, (7 more...)

#artificialintelligence

Country: Europe > Slovenia > Central Slovenia > Municipality of Ljubljana > Ljubljana (0.29)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)

Add feedback

Extreme Gradient Boosting and Behavioral Biometrics

Manning, Benjamin (University of Georgia)

AAAI ConferencesFeb-14-2017

As insider hacks become more prevalent it is becoming more useful to identify valid users from the inside of a system rather than from the usual external entry points where exploits are used to gain entry. One of the main goals of this study was to ascertain how well Gradient Boosting could be used for prediction or, in this case, classification or identification of a specific user through the learning of HCI-based behavioral biometrics. If applicable, this procedure could be used to verify users after they have gained entry into a protected system using data that is as human-centric as other biometrics, but less invasive. For this study an Extreme Gradient Boosting algorithm was used for training and testing on a dataset containing keystroke dynamics information. This specific algorithm was chosen because the majority of current research utilizes mainstream methods such as KNN and SVM and the hypothesis of this study was centered on the potential applicability of ensemble related decision or model trees. The final predictive model produced an accuracy of 0.941 with a Kappa value of 0.942 demonstrating that HCI-based behavioral biometrics in the form of keystroke dynamics can be used to identify the users of a system.

algorithm, artificial intelligence, machine learning, (15 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: North America > United States > Georgia > Clarke County > Athens (0.15)

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.95)

Add feedback

Webinar: Improve Your Regression with CART and Gradient Boosting

@machinelearnbotFeb-13-2017, 05:40:08 GMT

In this webinar we'll introduce you to a powerful tree-based machine learning algorithm called gradient boosting. Gradient boosting often outperforms linear regression, Random Forests, and CART. Boosted trees automatically handle variable selection, variable interactions, nonlinear relationships, outliers, and missing values. We'll see that CART decision trees are the foundation of gradient boosting and discuss some of the advantages of boosting versus a Random Forest. We will explore the gradient boosting algorithm and discuss the most important modeling parameters like the learning rate, number of terminal nodes, number of trees, loss functions, and more.

artificial intelligence, machine learning, regression, (3 more...)

@machinelearnbot

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.55)

Add feedback