AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Mondrian Forests: Efficient Online Random Forests

Lakshminarayanan, Balaji, Roy, Daniel M., Teh, Yee Whye

Neural Information Processing SystemsFeb-14-2020, 11:55:47 GMT

Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance.

decision tree, efficient online random forest, mondrian forest, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Online Gradient Boosting

Beygelzimer, Alina, Hazan, Elad, Kale, Satyen, Luo, Haipeng

Neural Information Processing SystemsFeb-14-2020, 11:43:45 GMT

We extend the theory of boosting for regression problems to the online learning setting. Generalizing from the batch setting for boosting, the notion of a weak learning algorithm is modeled as an online learning algorithm with linear loss functions that competes with a base class of regression functions, while a strong learning algorithm is an online learning algorithm with smooth convex loss functions that competes with a larger class of regression functions. Our main result is an online gradient boosting algorithm which converts a weak online learning algorithm into a strong one where the larger class of functions is the linear span of the base class. We also give a simpler boosting algorithm that converts a weak online learning algorithm into a strong one where the larger class of functions is the convex hull of the base class, and prove its optimality. Papers published at the Neural Information Processing Systems Conference.

algorithm, larger class, online, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.66)

Add feedback

Pruning Random Forests for Prediction on a Budget

Nan, Feng, Wang, Joseph, Saligrama, Venkatesh

Neural Information Processing SystemsFeb-14-2020, 11:12:41 GMT

We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets.

integer program, prediction, pruning random forest, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.67)

Add feedback

Cost efficient gradient boosting

Peter, Sven, Diego, Ferran, Hamprecht, Fred A., Nadler, Boaz

Neural Information Processing SystemsFeb-14-2020, 08:25:38 GMT

Many applications require learning classifiers or regressors that are both accurate and cheap to evaluate. Prediction cost can be drastically reduced if the learned predictor is constructed such that on the majority of the inputs, it uses cheap features and fast evaluations. The main challenge is to do so with little loss in accuracy. In this work we propose a budget-aware strategy based on deep boosted regression trees. In contrast to previous approaches to learning with cost penalties, our method can grow very deep trees that on average are nonetheless cheap to compute.

cost efficient gradient

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.46)

Add feedback

ENIGMA Anonymous: Symbol-Independent Inference Guiding Machine (system description)

Jakubův, Jan, Chvalovský, Karel, Olšák, Miroslav, Piotrowski, Bartosz, Suda, Martin, Urban, Josef

arXiv.org Artificial IntelligenceFeb-13-2020

We describe an implementation of gradient boosting and neural guidance of saturation-style automated theorem provers that does not depend on consistent symbol names across problems. For the gradient-boosting guidance, we manually create abstracted features by considering arity-based encodings of formulas. For the neural guidance, we use symbol-independent graph neural networks and their embedding of the terms and clauses. The two methods are efficiently implemented in the E prover and its ENIGMA learning-guided framework and evaluated on the MPTP large-theory benchmark. Both methods are shown to achieve comparable real-time performance to state-of-the-art symbol-based methods.

cezary kaliszyk, guidance, iteration, (15 more...)

arXiv.org Artificial Intelligence

2002.05406

Country:

Europe > Austria > Vienna (0.14)
South America > Brazil > Rio Grande do Norte > Natal (0.04)
Oceania > Fiji > Central Division > Suva (0.04)
(8 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.89)

Add feedback

Bagging and Random Forest for Imbalanced Classification

#artificialintelligenceFeb-12-2020, 06:05:36 GMT

Bagging is an ensemble algorithm that fits multiple models on different subsets of a training dataset, then combines the predictions from all models. Random forest is an extension of bagging that also randomly selects subsets of features used in each data sample. Both bagging and random forests have proven effective on a wide range of different predictive modeling problems. Although effective, they are not suited to classification problems with a skewed class distribution. Nevertheless, many modifications to the algorithms have been proposed that adapt their behavior and make them better suited to a severe class imbalance. In this tutorial, you will discover how to use bagging and random forest for imbalanced classification.

algorithm, dataset, majority class, (14 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)

Add feedback

Machine Learning in Python: Main developments and technology trends in data science, machine learning, and artificial intelligence

Raschka, Sebastian, Patterson, Joshua, Nolet, Corey

arXiv.org Machine LearningFeb-12-2020

Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline. At the core of this revolution lies the tools and the methods that are driving it, from processing the massive piles of data generated each day to learning from and taking useful action. Deep neural networks, along with advancements in classical ML and scalable general-purpose GPU computing, have become critical components of artificial intelligence, enabling many of these astounding breakthroughs and lowering the barrier to adoption. Python continues to be the most preferred language for scientific computing, data science, and machine learning, boosting both performance and productivity by enabling the use of low-level libraries and clean high-level APIs. This survey offers insight into the field of machine learning with Python, taking a tour through important topics to identify some of the core hardware and software paradigms that have enabled it. We cover widely-used libraries and concepts, collected together for holistic comparison, with the goal of educating the reader and driving the field of Python machine learning forward.

algorithm, arxiv preprint arxiv, library, (15 more...)

arXiv.org Machine Learning

2002.04803

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > Maryland > Baltimore County (0.04)
North America > United States > Maryland > Baltimore (0.04)
Asia > Middle East > Jordan (0.04)

Genre:

Overview (1.00)
Research Report > Promising Solution (0.67)

Industry:

Information Technology > Security & Privacy (0.68)
Leisure & Entertainment > Games (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.93)
(4 more...)

Add feedback

Stochastic tree ensembles for regularized nonlinear regression

He, Jingyu, Hahn, P. Richard

arXiv.org Machine LearningFeb-9-2020

Tree-based algorithms for supervised learning, such as Classification and Regression Trees (CART) (Breiman et al., 1984), random forests (Breiman, 1996, 2001), adaBoost (Freund and Schapire, 1997), and gradient boosting (Breiman, 1997; Friedman, 2001, 2002), are widely used for applied supervised learning. As a whole, these methods are popular in applied settings due to their speed and accuracy in mean estimation and out-of-sample prediction tasks. One limitation of such methods is their well-known sensitivity to tuning parameters, which require costly cross-validation to optimize. Bayesian additive regression trees (BART) (Chipman et al., 2007, 2010) is a popular model-based alternative that is often more accurate than other treebased methods; specifically, BART boasts valuable robustness to the choice of tuning-parameters. However, relative to random forests and boosting, BART's wider adoption has been slowed by its more severe computational demands, owing to its reliance on a random walk Metropolis-Hastings Markov chain Monte Carlo (MCMC) algorithm. Despite this limitation, BART has inspired a considerable body of research in recent years.

algorithm, node, split criterion, (14 more...)

arXiv.org Machine Learning

2002.03375

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California (0.04)
North America > United States > Arizona (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
(2 more...)

Add feedback

Gain State-Of-The-Art Results on Tabular Data with Deep Learning & Embedding Layers [A How To Guide]

#artificialintelligenceFeb-5-2020, 08:42:50 GMT

Tree-based models like Random Forest and XGBoost have become very popular in solving tabular(structured) data problems and gained a lot of tractions in Kaggle competitions lately. It has its very deserving reasons. However, in this article, I want to introduce a different approach from fast.ai's Tree-based models like Random Forest and XGBoost have become very popular in solving tabular(structured) data problems and gained a lot of tractions in Kaggle competitions lately. It has its very deserving reasons.

databunch, deep learning, validation, (13 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Additive Tree Ensembles: Reasoning About Potential Instances

Devos, Laurens, Meert, Wannes, Davis, Jesse

arXiv.org Artificial IntelligenceJan-31-2020

Imagine being able to ask questions to a black box model such as "Which adversarial examples exist?", "Does a specific attribute have a disproportionate effect on the model's prediction?" or "What kind of predictions are possible for a partially described example?" This last question is particularly important if your partial description does not correspond to any observed example in your data, as it provides insight into how the model will extrapolate to unseen data. These capabilities would be extremely helpful as it would allow a user to better understand the model's behavior, particularly as it relates to issues such as robustness, fairness, and bias. In this paper, we propose such an approach for an ensemble of trees. Since, in general, this task is intractable we present a strategy that (1) can prune part of the input space given the question asked to simplify the problem; and (2) follows a divide and conquer approach that is incremental and can always return some answers and indicates which parts of the input domains are still uncertain. The usefulness of our approach is shown on a diverse set of use cases.

constraint, ensemble, smt solver, (13 more...)

arXiv.org Artificial Intelligence

2001.11905

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.70)

Add feedback