Ensemble Learning
ADABOOK & MULTIBOOK: Adaptive Boosting with Chance Correction
There has been considerable interest in boosting and bagging, including the combination of the adaptive techniques of AdaBoost with the random selection with replacement techniques of Bagging. At the same time there has been a revisiting of the way we evaluate, with chance-corrected measures like Kappa, Informedness, Correlation or ROC AUC being advocated. This leads to the question of whether learning algorithms can do better by optimizing an appropriate chance corrected measure. Indeed, it is possible for a weak learner to optimize Accuracy to the detriment of the more reaslistic chance-corrected measures, and when this happens the booster can give up too early. This phenomenon is known to occur with conventional Accuracy-based AdaBoost, and the MultiBoost algorithm has been developed to overcome such problems using restart techniques based on bagging. This paper thus complements the theoretical work showing the necessity of using chance-corrected measures for evaluation, with empirical work showing how use of a chance-corrected measure can improve boosting. We show that the early surrender problem occurs in MultiBoost too, in multiclass situations, so that chance-corrected AdaBook and Multibook can beat standard Multiboost or AdaBoost, and we further identify which chance-corrected measures to use when.
Machine Learning Applied to Registry Data
Craniosynostosis is the premature fusion of 1 cranial sutures and often requires surgical intervention. Surgery may involve extensive osteotomies, which can lead to substantial blood loss. Currently, there are no consensus recommendations for guiding blood conservation or transfusion in this patient population. The aim of this study is to develop a machine-learning model to predict blood product transfusion requirements for individual pediatric patients undergoing craniofacial surgery. Using data from 2143 patients in the Pediatric Craniofacial Surgery Perioperative Registry, we assessed 6 machine-learning classification and regression models based on random forest, adaptive boosting (AdaBoost), neural network, gradient boosting machine (GBM), support vector machine, and elastic net methods with inputs from 22 demographic and preoperative features.
Local Cascade Ensemble for Multivariate Data Classification
Fauvel, Kevin, Fromont, รlisa, Masson, Vรฉronique, Faverdin, Philippe, Termier, Alexandre
There are three main reasons We present LCE, a Local Cascade Ensemble for that justify the use of ensembles over single classifiers [Dietterich, traditional (tabular) multivariate data classification, 2000]: statistical (reduce the risk of choosing the and its extension LCEM for Multivariate Time Series wrong classifier by averaging when the amount of training (MTS) classification. LCE is a new hybrid ensemble data available is too small compared to the size of the hypothesis method that combines an explicit boostingbagging space), computational (local search from many different approach to handle the bias-variance tradeoff starting points may provide a better approximation to faced by machine learning models and an implicit the true unknown function than any of the individual classifier), divide-and-conquer approach to individualize and representational (expansion of the space of representable classifier errors on different parts of the training functions).
Uncovering Feature Interdependencies in Complex Systems with Non-Greedy Random Forests
Donick, Delilah, Lera, Sandro Claudio
A "non-greedy" variation of the random forest algorithm is presented to better uncover feature interdependencies inherent in complex systems. Conventionally, random forests are built from "greedy" decision trees which each consider only one split at a time during their construction. In contrast, the decision trees included in this random forest algorithm each consider three split nodes simultaneously in tiers of depth two. It is demonstrated on synthetic data and bitcoin price time series that the non-greedy version significantly outperforms the greedy one if certain non-linear relationships between feature-pairs are present. In particular, both greedy and a non-greedy random forests are trained to predict the signs of daily bitcoin returns and backtest a long-short trading strategy. The better performance of the non-greedy algorithm is explained by the presence of "XOR-like" relationships between long-term and short-term technical indicators. When no such relationships exist, performance is similar. Given its enhanced ability to understand the feature-interdependencies present in complex systems, this non-greedy extension should become a standard method in the toolkit of data scientists.
Random Forest Algorithm in Machine Learning
Random Forest or random decision forests are an ensemble learning method for classification, regression and other tasks that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes or mean prediction of the individual trees. Random forest is a supervised learning algorithm. The "forest" it builds, is an ensemble of decision trees, usually trained with the "bagging" method. The general idea of the bagging method is that a combination of learning models increases the overall result. Random Forest is an ensemble method.
Attention augmented differentiable forest for tabular data
Differentiable forest is an ensemble of decision trees with full differentiability. Its simple tree structure is easy to use and explain. With full differentiability, it would be trained in the end-to-end learning framework with gradient-based optimization method. In this paper, we propose tree attention block(TAB) in the framework of differentiable forest. TAB block has two operations, squeeze and regulate. The squeeze operation would extract the characteristic of each tree. The regulate operation would learn nonlinear relations between these trees. So TAB block would learn the importance of each tree and adjust its weight to improve accuracy. Our experiment on large tabular dataset shows attention augmented differentiable forest would get comparable accuracy with gradient boosted decision trees(GBDT), which is the state-of-the-art algorithm for tabular datasets. And on some datasets, our model has higher accuracy than best GBDT libs (LightGBM, Catboost, and XGBoost). Differentiable forest model supports batch training and batch size is much smaller than the size of training set. So on larger data sets, its memory usage is much lower than GBDT model. The source codes are available at https://github.com/closest-git/QuantumForest.
XGBoost vs LightGBM on a High Dimensional Dataset
I have recently completed a multi-class classification problem given as a take-home assignment for a data scientist position. It was a good opportunity to compare the two state-of-the-art implementations of gradient boosting decision trees which are XGBoost and LightGBM. Both algorithms are so powerful that they are prominent among the best performing machine learning models. The dataset contains over 60 thousand observations and 103 numerical features. The target variable contains 9 different classes.
Selective Cascade of Residual ExtraTrees
We propose a novel tree-based ensemble method named Selective Cascade of Residual ExtraTrees (SCORE). SCORE draws inspiration from representation learning, incorporates regularized regression with variable selection features, and utilizes boosting to improve prediction and reduce generalization errors. We also develop a variable importance measure to increase the explainability of SCORE. Our computer experiments show that SCORE provides comparable or superior performance in prediction against ExtraTrees, random forest, gradient boosting machine, and neural networks; and the proposed variable importance measure for SCORE is comparable to studied benchmark methods. Finally, the predictive performance of SCORE remains stable across hyper-parameter values, suggesting potential robustness to hyperparameter specification.
CHIRPS: Explaining random forest classification
Modern machine learning methods typically produce "black box" models that are opaque to interpretation. Yet, their demand has been increasing in the Human-in-the-Loop processes, that is, those processes that require a human agent to verify, approve or reason about the automated decisions before they can be applied. To facilitate this interpretation, we propose Collection of High Importance Random Path Snippets (CHIRPS); a novel algorithm for explaining random forest classification per data instance. CHIRPS extracts a decision path from each tree in the forest that contributes to the majority classification, and then uses frequent pattern mining to identify the most commonly occurring split conditions. Then a simple, conjunctive form rule is constructed where the antecedent terms are derived from the attributes that had the most influence on the classification.
Modeling Text with Decision Forests using Categorical-Set Splits
Guillame-Bert, Mathieu, Bruch, Sebastian, Mitrichev, Petr, Mikheev, Petr, Pfeifer, Jan
Decision forest algorithms model data by learning a binary tree structure recursively where every node splits the feature space into two regions, sending examples into the left or right branches. This "decision" is the result of the evaluation of a condition. For example, a node may split input data by applying a threshold to a numerical feature value. Such decisions are learned using (often greedy) algorithms that attempt to optimize a local loss function. Crucially, whether an algorithm exists to find and evaluate splits for a feature type (e.g., text) determines whether a decision forest algorithm can model that feature type at all. In this work, we set out to devise such an algorithm for textual features, thereby equipping decision forests with the ability to directly model text without the need for feature transformation. Our algorithm is efficient during training and the resulting splits are fast to evaluate with our extension of the QuickScorer inference algorithm. Experiments on benchmark text classification datasets demonstrate the utility and effectiveness of our proposal.