AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

A random forest system combination approach for error detection in digital dictionaries

Bloodgood, Michael, Ye, Peng, Rodrigues, Paul, Zajic, David, Doermann, David

arXiv.org Machine LearningOct-30-2014

When digitizing a print bilingual dictionary, whether via optical character recognition or manual entry, it is inevitable that errors are introduced into the electronic version that is created. We investigate automating the process of detecting errors in an XML representation of a digitized print dictionary using a hybrid approach that combines rule-based, feature-based, and language model-based methods. We investigate combining methods and show that using random forests is a promising approach. We find that in isolation, unsupervised methods rival the performance of supervised methods. Random forests typically require training data so we investigate how we can apply random forests to combine individual base methods that are themselves unsupervised without requiring large amounts of training data. Experiments reveal empirically that a relatively small amount of data is sufficient and can potentially be further reduced through specific selection criteria.

artificial intelligence, machine learning, random forest, (19 more...)

arXiv.org Machine Learning

1410.8553

Country:

North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Slovenia > Upper Carniola > Municipality of Bled > Bled (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife

Wager, Stefan, Hastie, Trevor, Efron, Bradley

arXiv.org Machine LearningMar-28-2014

We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2012) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B on the order of n^{1.5} bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B on the order of n replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies.

artificial intelligence, machine learning, variance, (18 more...)

arXiv.org Machine Learning

1311.4555

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > United States > New York (0.04)
North America > United States > North Carolina (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report > New Finding (0.87)

Industry:

Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)
Health & Medicine > Therapeutic Area > Oncology (0.46)
Transportation > Ground > Road (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.77)

Add feedback

The Random Forest Kernel and other kernels for big data from random partitions

Davies, Alex, Ghahramani, Zoubin

arXiv.org Machine LearningFeb-18-2014

We present Random Partition Kernels, a new class of kernels derived by demonstrating a natural connection between random partitions of objects and kernels between those objects. We show how the construction can be used to create kernels from methods that would not normally be viewed as random partitions, such as Random Forest. To demonstrate the potential of this method, we propose two new kernels, the Random Forest Kernel and the Fast Cluster Kernel, and show that these kernels consistently outperform standard kernels on problems involving real-world datasets. Finally, we show how the form of these kernels lend themselves to a natural approximation that is appropriate for certain big data problems, allowing $O(N)$ inference in methods such as Gaussian Processes, Support Vector Machines and Kernel PCA.

artificial intelligence, decision tree learning, machine learning, (13 more...)

arXiv.org Machine Learning

1402.4293

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.29)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Narrowing the Gap: Random Forests In Theory and In Practice

Denil, Misha, Matheson, David, de Freitas, Nando

arXiv.org Machine LearningOct-4-2013

Despite widespread interest and practical use, the theoretical properties of random forests are still not well understood. In this paper we contribute to this understanding in two ways. We present a new theoretically tractable variant of random regression forests and prove that our algorithm is consistent. We also provide an empirical evaluation, comparing our algorithm and other theoretically tractable random forest models to the random forest algorithm used in practice. Our experiments provide insight into the relative importance of different simplifications that theoreticians have made to obtain tractable models for analysis.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Machine Learning

1310.1415

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California (0.04)
North America > Canada > British Columbia (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

The AdaBoost Flow

Lykov, A., Muzychka, S., Vaninsky, K.

arXiv.org Machine LearningJun-24-2013

We introduce a dynamical system which we call the AdaBoost flow. The flow is defined by a system of ODEs with control. We show that three algorithms of the AdaBoost family (i) the AdaBoost algorithm of Schapire and Freund (ii) the arc-gv algorithm of Breiman (iii) the confidence rated prediction of Schapire and Singer can be can be embedded in the AdaBoost flow. The nontrivial part of the AdaBoost flow equations coincides with the equations of dynamics of nonperiodic Toda system written in terms of spectral variables. We provide a novel invariant geometrical description of the AdaBoost algorithm as a gradient flow on a foliation defined by level sets of the potential function. We propose a new approach for constructing boosting algorithms as a continuous time gradient flow on measures defined by various metrics and potential functions. Finally we explain similarity of the AdaBoost algorithm with the Perelman's construction for the Ricci flow.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1110.6228

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > New York (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)

Add feedback

Stochastic Aware Random Forests - A Variation Less Impacted by Randomness

Fernandes, Paulo (PUCRS University) | Lopes, Lucelene (PUCRS University) | Normey, Silvio (PUCRS University) | Ruiz, Duncan (PUCRS University)

AAAI ConferencesMay-19-2013

The impact of random choices is important to many ensemble classifiers algorithms, and the Random Forests is particularly sensible to pseudo-random number generation decisions.This paper proposes an extension to the classical Random Forests method that aims to reduce its sensibility to randomness.The benefits brought by such extension are illustrated by a large number of experiments over 32 different public data sets.

impacted, randomness, stochastic aware random forest

AAAI Conferences

The Twenty-Sixth International FLAIRS Conference

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.80)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.80)

Add feedback

Ensemble Models with Trees and Rules

Akdemir, Deniz

arXiv.org Machine LearningAug-23-2012

In this article, we have proposed several approaches for post processing a large ensemble of prediction models or rules. The results from our simulations show that the post processing methods we have considered here are promising. We have used the techniques developed here for estimation of quantitative traits from markers, on the benchmark "Bostob Housing"data set and in some simulations. In most cases, the produced models had better prediction performance than, for example, the ones produced by the random forest or the rulefit algorithms.

artificial intelligence, ensemble, machine learning, (14 more...)

arXiv.org Machine Learning

1112.3699

Country: North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.51)

Add feedback

Generalized Boosting Algorithms for Convex Optimization

Grubb, Alexander, Bagnell, J. Andrew

arXiv.org Machine LearningFeb-14-2012

Boosting is a popular way to derive powerful learners from simpler hypothesis classes. Following previous work (Mason et al., 1999; Friedman, 2000) on general boosting frameworks, we analyze gradient-based descent algorithms for boosting with respect to any convex objective and introduce a new measure of weak learner performance into this setting which generalizes existing work. We present the weak to strong learning guarantees for the existing gradient boosting work for strongly-smooth, strongly-convex objectives under this new measure of performance, and also demonstrate that this work fails for non-smooth objectives. To address this issue, we present new algorithms which extend this boosting approach to arbitrary convex loss functions and give corresponding weak to strong convergence results. In addition, we demonstrate experimental results that support our analysis and demonstrate the need for the new algorithms we present.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1105.2054

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
South America > Paraguay > Asunción > Asunción (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry: Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.34)

Add feedback

Random Forests for Metric Learning with Implicit Pairwise Position Dependence

Xiong, Caiming, Johnson, David, Xu, Ran, Corso, Jason J.

arXiv.org Machine LearningJan-3-2012

Metric learning makes it plausible to learn distances for complex distributions of data from labeled data. However, to date, most metric learning methods are based on a single Mahalanobis metric, which cannot handle heterogeneous data well. Those that learn multiple metrics throughout the space have demonstrated superior accuracy, but at the cost of computational efficiency. Here, we take a new angle to the metric learning problem and learn a single metric that is able to implicitly adapt its distance function throughout the feature space. This metric adaptation is accomplished by using a random forest-based classifier to underpin the distance function and incorporate both absolute pairwise position and standard relative position into the representation. We have implemented and tested our method against state of the art global and multi-metric methods on a variety of data sets. Overall, the proposed method outperforms both types of methods in terms of accuracy (consistently ranked first) and is an order of magnitude faster than state of the art multi-metric methods (16x faster in the worst case).

artificial intelligence, machine learning, metric, (14 more...)

arXiv.org Machine Learning

1201.061

Country:

Asia > Middle East > Lebanon (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)

Add feedback

Dimension Reduction Using Rule Ensemble Machine Learning Methods: A Numerical Study of Three Ensemble Methods

DeMasi, Orianna, Meza, Juan, Bailey, David H.

arXiv.org Machine LearningAug-30-2011

Ensemble methods for supervised machine learning have become popular due to their ability to accurately predict class labels with groups of simple, lightweight "base learners." While ensembles offer computationally efficient models that have good predictive capability they tend to be large and offer little insight into the patterns or structure in a dataset. We consider an ensemble technique that returns a model of ranked rules. The model accurately predicts class labels and has the advantage of indicating which parameter constraints are most useful for predicting those labels. An example of the rule ensemble method successfully ranking rules and selecting attributes is given with a dataset containing images of potential supernovas where the number of necessary features is reduced from 39 to 21. We also compare the rule ensemble method on a set of multi-class problems with boosting and bagging, which are two well known ensemble techniques that use decision trees as base learners, but do not have a rule ranking scheme.

artificial intelligence, ensemble method, machine learning, (15 more...)

arXiv.org Machine Learning

1108.6094

Country: North America > United States > California > Alameda County > Berkeley (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback