AITopics | Ensemble Learning

Collaborating Authors

Ensemble Learning

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Sequential Feature Classification in the Context of Redundancies

Pfannschmidt, Lukas, Hammer, Barbara

arXiv.org Machine LearningApr-1-2020

The problem of all-relevant feature selection is concerned with finding a relevant feature set with preserved redundancies. There exist several approximations to solve this problem but only one could give a distinction between strong and weak relevance. This approach was limited to the case of linear problems. In this work, we present a new solution for this distinction in the non-linear case through the use of random forest models and statistical methods.

feature selection, importance value, relevant feature, (14 more...)

arXiv.org Machine Learning

2004.00658

Country: Europe > Germany (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.37)

Add feedback

Machine Learning Algorithms for Financial Asset Price Forecasting

Ndikum, Philip

arXiv.org Machine LearningMar-31-2020

This research paper explores the performance of Machine Learning (ML) algorithms and techniques that can be used for financial asset price forecasting. The prediction and forecasting of asset prices and returns remains one of the most challenging and exciting problems for quantitative finance and practitioners alike. The massive increase in data generated and captured in recent years presents an opportunity to leverage Machine Learning algorithms. This study directly compares and contrasts state-of-the-art implementations of modern Machine Learning algorithms on high performance computing (HPC) infrastructures versus the traditional and highly popular Capital Asset Pricing Model (CAPM) on U.S equities data. The implemented Machine Learning models - trained on time series data for an entire stock universe (in addition to exogenous macroeconomic variables) significantly outperform the CAPM on out-of-sample (OOS) test data.

algorithm, doi, machine learning algorithm, (7 more...)

arXiv.org Machine Learning

2004.01504

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Austria > Styria > Graz (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry:

Law > Business Law (1.00)
Banking & Finance > Trading (1.00)
Banking & Finance > Economy (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)

Add feedback

Fully-Corrective Gradient Boosting with Squared Hinge: Fast Learning Rates and Early Stopping

Zeng, Jinshan, Zhang, Min, Lin, Shao-Bo

arXiv.org Machine LearningMar-31-2020

Boosting is a well-known method for improving the accuracy of weak learners in machine learning. However, its theoretical generalization guarantee is missing in literature. In this paper, we propose an efficient boosting method with theoretical generalization guarantees for binary classification. Three key ingredients of the proposed boosting method are: a) the \textit{fully-corrective greedy} (FCG) update in the boosting procedure, b) a differentiable \textit{squared hinge} (also called \textit{truncated quadratic}) function as the loss function, and c) an efficient alternating direction method of multipliers (ADMM) algorithm for the associated FCG optimization. The used squared hinge loss not only inherits the robustness of the well-known hinge loss for classification with outliers, but also brings some benefits for computational implementation and theoretical justification. Under some sparseness assumption, we derive a fast learning rate of the order ${\cal O}((m/\log m)^{-1/4})$ for the proposed boosting method, which can be further improved to ${\cal O}((m/\log m)^{-1/2})$ if certain additional noise assumption is imposed, where $m$ is the size of sample set. Both derived learning rates are the best ones among the existing generalization results of boosting-type methods for classification. Moreover, an efficient early stopping scheme is provided for the proposed method. A series of toy simulations and real data experiments are conducted to verify the developed theories and demonstrate the effectiveness of the proposed method.

classification, experiment, loss function, (16 more...)

arXiv.org Machine Learning

2004.00179

Country:

Asia > China > Shaanxi Province > Xi'an (0.04)
North America > United States > Washington > King County > Bellevue (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.64)

Add feedback

Build your first Machine Learning pipeline using scikit-learn!

#artificialintelligenceMar-29-2020, 12:44:57 GMT

For building any machine learning model, it is important to have a sufficient amount of data to train the model. The data is often collected from various resources and might be available in different formats. Due to this reason, data cleaning and preprocessing become a crucial step in the machine learning project. Whenever new data points are added to the existing data, we need to perform the same preprocessing steps again before we can use the machine learning model to make predictions. This becomes a tedious and time-consuming process!

machine learning pipeline, model building process, pipeline, (13 more...)

#artificialintelligence

Genre: Workflow (0.73)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.31)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Interpretability: Cracking open the black box – Part II

#artificialintelligenceMar-25-2020, 02:47:48 GMT

In the last post in the series, we defined what interpretability is and looked at a few interpretable models and the quirks and'gotchas' in it. Now let's dig deeper into the post-hoc interpretation techniques which is useful when you model itself is not transparent. This resonates with most real world use cases, because whether we like it or not, we get better performance with a black box model. For this exercise, I have chosen the Adult dataset a.k.a Census Income dataset. Census Income is a pretty popular dataset which has demographic information like age, occupation, along with a column which tells us if the income of the particular person 50k or not. We are using this column to run a binary classification using Random Forest.

feature importance, mean decrease, plot show, (17 more...)

#artificialintelligence

Industry: Transportation > Air (0.61)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.38)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.37)

Add feedback

BoostTree and BoostForest for Ensemble Learning

Zhao, Changming, Wu, Dongrui, Huang, Jian, Yuan, Ye, Zhang, Hai-Tao

arXiv.org Machine LearningMar-21-2020

Bootstrap aggregation (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite learner. This article proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree by gradient boosting, which trains a linear or nonlinear model at each node. When a new sample comes in, BoostTree first sorts it down to a leaf, then computes the final prediction by summing up the outputs of all models along the path from the root node to that leaf. BoostTree achieves high randomness (diversity) by sampling its parameters randomly from a parameter pool, and selecting a subset of features randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest is compared with four classical ensemble learning approaches on 30 classification and regression datasets, demonstrating that it can generate more accurate and more robust composite learners.

boosttree, dataset, node, (13 more...)

arXiv.org Machine Learning

2003.09737

Country:

North America > Canada > Quebec > Montreal (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
Oceania > Australia > Tasmania (0.04)
(10 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.53)

Add feedback

Minimal Variance Sampling in Stochastic Gradient Boosting

Ibragimov, Bulat, Gusev, Gleb

Neural Information Processing SystemsMar-19-2020, 02:48:22 GMT

Stochastic Gradient Boosting (SGB) is a widely used approach to regularization of boosting models based on decision trees. It was shown that, in many cases, random sampling at each iteration can lead to better generalization performance of the model and can also decrease the learning time. Different sampling approaches were proposed, where probabilities are not uniform, and it is not currently clear which approach is the most effective. In this paper, we formulate the problem of randomization in SGB in terms of optimization of sampling probabilities to maximize the estimation accuracy of split scoring used to train decision trees.This optimization problem has a closed-form nearly optimal solution, and it leads to a new sampling technique, which we call Minimal Variance Sampling (MVS).The method both decreases the number of examples needed for each iteration of boosting and increases the quality of the model significantly as compared to the state-of-the art sampling methods. The superiority of the algorithm was confirmed by introducing MVS as a new default option for subsampling in CatBoost, a gradient boosting library achieving state-of-the-art quality on various machine learning tasks. Papers published at the Neural Information Processing Systems Conference.

Add feedback

Robustness Verification of Tree-based Models

Chen, Hongge, Zhang, Huan, Si, Si, Li, Yang, Boning, Duane, Hsieh, Cho-Jui

Neural Information Processing SystemsMar-19-2020, 01:45:45 GMT

We study the robustness verification problem of tree based models, including random forest (RF) and gradient boosted decision tree (GBDT). Formal robustness verification of decision tree ensembles involves finding the exact minimal adversarial perturbation or a guaranteed lower bound of it. Existing approaches cast this verification problem into a mixed integer linear programming (MILP) problem, which finds the minimal adversarial distortion in exponential time so is impractical for large ensembles. Although this verification problem is NP-complete in general, we give a more precise complexity characterization. We show that there is a simple linear time algorithm for verifying a single tree, and for tree ensembles the verification problem can be cast as a max-clique problem on a multi-partite boxicity graph.

robustness verification, tree-based model, verification problem, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.79)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.61)

Add feedback

A Debiased MDI Feature Importance Measure for Random Forests

Li, Xiao, Wang, Yu, Basu, Sumanta, Kumbier, Karl, Yu, Bin

Neural Information Processing SystemsMar-18-2020, 23:47:53 GMT

Tree ensembles such as Random Forests have achieved impressive empirical success across a wide variety of applications. To understand how these models make predictions, people routinely turn to feature importance measures calculated from tree ensembles. It has long been known that Mean Decrease Impurity (MDI), one of the most widely used measures of feature importance, incorrectly assigns high importance to noisy features, leading to systematic bias in feature selection. In this paper, we address the feature selection bias of MDI from both theoretical and methodological perspectives. Based on the original definition of MDI by Breiman et al. \cite{Breiman1984} for a single tree, we derive a tight non-asymptotic bound on the expected bias of MDI importance of noisy features, showing that deep trees have higher (expected) feature selection bias than shallow ones.

debiased mdi feature importance measure, feature selection bia, random forest, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.69)

Add feedback

Regularized Gradient Boosting

Cortes, Corinna, Mohri, Mehryar, Storcheus, Dmitry

Neural Information Processing SystemsMar-18-2020, 22:46:37 GMT

Gradient Boosting (\GB) is a popular and very successful ensemble method for binary trees. While various types of regularization of the base predictors are used with this algorithm, the theory that connects such regularizations with generalization guarantees is poorly understood. We fill this gap by deriving data-dependent learning guarantees for \GB\ used with \emph{regularization}, expressed in terms of the Rademacher complexities of the constrained families of base predictors. We introduce a new algorithm, called \rgb\, that directly benefits from these generalization bounds and that, at every boosting round, applies the \emph{Structural Risk Minimization} principle to search for a base predictor with the best empirical fit versus complexity trade-off. Inspired by \emph{Randomized Coordinate Descent} we provide a scalable implementation of our algorithm, able to search over large families of base predictors.

algorithm, base predictor, regularization, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.65)

Add feedback