Goto

Collaborating Authors

 Ensemble Learning


Boulevard: Regularized Stochastic Gradient Boosted Trees and Their Limiting Distribution

arXiv.org Machine Learning

This paper presents a theoretical study of gradient boosted trees (GBT: Friedman, 2001). Machine learning methods for prediction have generally been thought of as trading off both intelligibility and statistical uncertainty quantification in favor of accuracy. Recent results have started to provide a statistical understanding of methods based on ensembles of decision trees (Breiman et al., 1984). In particular, the consistency of methods related to Random Forests (RFs: Breiman, 2001) has been demonstrated in Biau (2012); Scornet et al. (2015) while Wager et al. (2014); Mentch and Hooker (2016); Wager and Athey (2017) and Athey et al. (2016) prove central limit theorems for RF predictions. These have then been used for tests of variable importance and nonparametric interactions in Mentch and Hooker (2017). In this paper, we extend this analysis to GBT. Analyses of RFs have relied on a subsampling structure to express the estimator in the form of a U-statistic from which central limit theorems can be derived. By contrast, GBT produces trees sequentially with the current tree depending on the values in those built previously, requiring a different analytical approach. While the algorithm proposed in Friedman (2001) is intended to be generally applicable to any loss function, in this paper we focus specifically on nonparametric regression (Stone, 1977, 1982).


Machine Learning Best Algorithms: Gradient Boosting Machines (GBM)

#artificialintelligence

We'll have a main talk (30 mins) and 3 excellent lightning talks about the machine learning algorithm that usually achieves the best accuracy on structured/tabular data (e.g. in industry/business applications or in Kaggle competitions): Abstract: With all the hype about deep learning and "AI", it is not well publicized that for structured/tabular data widely encountered in business applications it is actually another machine learning algorithm, the gradient boosting machine (GBM) that most often achieves the highest accuracy in supervised learning tasks. In this talk we'll review some of the main GBM implementations available as R and Python packages such as xgboost, h2o, lightgbm etc, we'll discuss some of their main features and characteristics, and we'll see how tuning GBMs and creating ensembles of the best models can achieve the best prediction accuracy for many business problems. Bio: Szilard studied Physics in the 90s and obtained a PhD by using statistical methods to analyze the risk of financial portfolios. He worked in finance, then more than a decade ago moved to become the Chief Scientist of a tech company in Santa Monica doing everything data (analysis, modeling, data visualization, machine learning, data infrastructure etc). He is the founder/organizer of several meetups in the Los Angeles area (R, data science etc) and the data science community website datascience.la.


A Comprehensive Guide to Ensemble Learning (with Python codes) - Analytics Vidhya

#artificialintelligence

When you want to purchase a new car, will you walk up to the first car shop and purchase one based on the advice of the dealer? You would likely browser a few web portals where people have posted their reviews and compare different car models, checking for their features and prices. You will also probably ask your friends and colleagues for their opinion. In short, you wouldn't directly reach a conclusion, but will instead make a decision considering the opinions of other people as well. Ensemble models in machine learning operate on a similar idea. They combine the decisions from multiple models to improve the overall performance.


Comparison-Based Random Forests

arXiv.org Machine Learning

Assume we are given a set of items from a general metric space, but we neither have access to the representation of the data nor to the distances between data points. Instead, suppose that we can actively choose a triplet of items (A,B,C) and ask an oracle whether item A is closer to item B or to item C. In this paper, we propose a novel random forest algorithm for regression and classification that relies only on such triplet comparisons. In the theory part of this paper, we establish sufficient conditions for the consistency of such a forest. In a set of comprehensive experiments, we then demonstrate that the proposed random forest is efficient both for classification and regression. In particular, it is even competitive with other methods that have direct access to the metric representation of the data.


Combining Multiple Algorithms in Classifier Ensembles using Generalized Mixture Functions

arXiv.org Machine Learning

Classifier ensembles are pattern recognition structures composed of a set of classification algorithms (members), organized in a parallel way, and a combination method with the aim of increasing the classification accuracy of a classification system. In this study, we investigate the application of a generalized mixture (GM) functions as a new approach for providing an efficient combination procedure for these systems through the use of dynamic weights in the combination process. Therefore, we present three GM functions to be applied as a combination method. The main advantage of these functions is that they can define dynamic weights at the member outputs, making the combination process more efficient. In order to evaluate the feasibility of the proposed approach, an empirical analysis is conducted, applying classifier ensembles to 25 different classification data sets. In this analysis, we compare the use of the proposed approaches to ensembles using traditional combination methods as well as the state-of-the-art ensemble methods. Our findings indicated gains in terms of performance when comparing the proposed approaches to the traditional ones as well as comparable results with the state-of-the-art methods.


Multi-Layered Gradient Boosting Decision Trees

arXiv.org Machine Learning

Multi-layered representation is believed to be the key ingredient of deep neural networks especially in cognitive tasks like computer vision. While non-differentiable models such as gradient boosting decision trees (GBDTs) are the dominant methods for modeling discrete or tabular data, they are hard to incorporate with such representation learning ability. In this work, we propose the multi-layered GBDT forest (mGBDTs), with an explicit emphasis on exploring the ability to learn hierarchical representations by stacking several layers of regression GBDTs as its building block. The model can be jointly trained by a variant of target propagation across layers, without the need to derive back-propagation nor differentiability. Experiments and visualizations confirmed the effectiveness of the model in terms of performance and representation learning ability.


r/MachineLearning - [D] Error function for AdaBoost Algorithm

@machinelearnbot

I need to solve a task where it is asked to me to provide an error function whose minimization leads to a formulation equivalent to the AdaBoost algorithm. I did not understand exactly this question, I know that in the AdaBoost algorithm at the beginning I train a "weak" learner by minimizing its error function and then I used the weights to compute errors and iterate over the new classifier, this in an iterative way; so what does it mean with this error function to minimize?


Prediction Rule Reshaping

arXiv.org Machine Learning

Two methods are proposed for high-dimensional shape-constrained regression and classification. These methods reshape pre-trained prediction rules to satisfy shape constraints like monotonicity and convexity. The first method can be applied to any pre-trained prediction rule, while the second method deals specifically with random forests. In both cases, efficient algorithms are developed for computing the estimators, and experiments are performed to demonstrate their performance on four datasets. We find that reshaping methods enforce shape constraints without compromising predictive accuracy.


Wavelet Decomposition of Gradient Boosting

arXiv.org Machine Learning

In this paper we introduce a significant improvement to the popular tree-based Stochastic Gradient Boosting algorithm using a wavelet decomposition of the trees. This approach is based on harmonic analysis and approximation theoretical elements, and as we show through extensive experimentation, our wavelet based method generally outperforms existing methods, particularly in difficult scenarios of class unbalance and mislabeling in the training data.


Complete Analysis of a Random Forest Model

arXiv.org Machine Learning

Random forests have become an important tool for improving accuracy in regression problems since their popularization by (Breiman, 2001) and others. In this paper, we revisit a random forest model originally proposed by (Breiman, 2004) and later studied by (Biau, 2012), where a feature is selected at random and the split occurs at the midpoint of the block containing the chosen feature. If the regression function is sparse and depends only on a small, unknown subset of $ S $ out of $ d $ features, we show that given $ n $ observations, this random forest model outputs a predictor that has a mean-squared prediction error of order $ \left(n\sqrt{\log^{S-1} n}\right)^{-\frac{1}{S\log2+1}} $. When $ S \leq \lfloor 0.72 d \rfloor $, this rate is better than the minimax optimal rate $ n^{-\frac{2}{d+2}} $ for $ d $-dimensional, Lipschitz function classes. As a consequence of our analysis, we show that the variance of the forest decays with the depth of the tree at a rate that is independent of the ambient dimension, even when the trees are fully grown. In particular, if $ \ell_{avg} $ (resp. $ \ell_{max} $) is the average (resp. maximum) number of observations per leaf node, we show that the variance of this forest is $ \Theta\left(\ell^{-1}_{avg}(\sqrt{\log n})^{-(S-1)}\right) $, which for the case of $ S = d $, is similar in form to the lower bound $ \Omega\left(\ell^{-1}_{max}(\log n)^{-(d-1)}\right) $ of (Lin and Jeon, 2006) for any random forest model with a nonadaptive splitting scheme. We also show that the bias is tight for any linear model with nonzero parameter vector. Thus, we completely characterize the fundamental limits of this random forest model. Our new analysis also implies that better theoretical performance can be achieved if the trees are grown less aggressively (i.e., grown to a shallower depth) than previous work would otherwise recommend.