The Gradient Boosting Machine is a powerful ensemble machine learning algorithm that uses decision trees. Boosting is a general ensemble technique that involves sequentially adding models to the ensemble where subsequent models correct the performance of prior models. AdaBoost was the first algorithm to deliver on the promise of boosting. Gradient boosting is a generalization of AdaBoosting, improving the performance of the approach and introducing ideas from bootstrap aggregation to further improve the models, such as randomly sampling the samples and features when fitting ensemble members. Gradient boosting performs well, if not the best, on a wide range of tabular datasets, and versions of the algorithm like XGBoost and LightBoost often play an important role in winning machine learning competitions.
To facilitate implementation of high-accuracy deep neural networks especially on resource-constrained devices, maintaining low computation requirements is crucial. Using very deep models for classification purposes not only decreases the neural network training speed and increases the inference time, but also need more data for higher prediction accuracy and to mitigate false positives. In this paper, we propose an efficient and lightweight deep classification ensemble structure based on a combination of simple color features, which is particularly designed for "high-accuracy" image classifications with low false positives. We designed, implemented, and evaluated our approach for explosion detection use-case applied to images and videos. Our evaluation results based on a large test test show considerable improvements on the prediction accuracy compared to the popular ResNet-50 model, while benefiting from 7.64x faster inference and lower computation cost. While we applied our approach to explosion detection, our approach is general and can be applied to other similar classification use cases as well. Given the insight gained from our experiments, we hence propose a "think small, think many" philosophy in classification scenarios: that transforming a single, large, monolithic deep model into a verification-based step model ensemble of multiple small, simple, lightweight models with narrowed-down color spaces can possibly lead to predictions with higher accuracy.
Model ensembles can achieve lower generalization error than single models but are challenging to develop with deep learning neural networks given the computational cost of training each single model. An alternative is to train multiple model snapshots during a single training run and combine their predictions to make an ensemble prediction. A limitation of this approach is that the saved models will be similar, resulting in similar predictions and predictions errors and not offering much benefit from combining their predictions. Effective ensembles require a diverse set of skillful ensemble members that have differing distributions of prediction errors. One approach to promoting a diversity of models saved during a single training run is to use an aggressive learning rate schedule that forces large changes in the model weights and, in turn, the nature of the model saved at each snapshot. In this tutorial, you will discover how to develop snapshot ensembles of models saved using an aggressive learning rate schedule over a single training run. How to Develop a Snapshot Ensemble Deep Learning Neural Network in Python With Keras Photo by Jason Jacobs, some rights reserved.
In this article supervised learning problems are solved using soft rule ensembles. We first review the importance sampling learning ensembles (ISLE) approach that is useful for generating hard rules. The soft rules are then obtained with logistic regression from the corresponding hard rules. In order to deal with the perfect separation problem related to the logistic regression, Firth's bias corrected likelihood is used. Various examples and simulation results show that soft rule ensembles can improve predictive performance over hard rule ensembles.
Ensembles, where multiple neural networks are trained individually and their predictions are averaged, have been shown to be widely successful for improving both the accuracy and predictive uncertainty of single neural networks. However, an ensemble's cost for both training and testing increases linearly with the number of networks, which quickly becomes untenable. In this paper, we propose BatchEnsemble 1, an ensemble method whose computational and memory costs are significantly lower than typical ensembles. BatchEnsemble achieves this by defining each weight matrix to be the Hadamard product of a shared weight among all ensemble members and a rank-one matrix per member. Unlike ensembles, BatchEnsemble is not only parallelizable across devices, where one device trains one member, but also parallelizable within a device, where multiple ensemble members are updated simultaneously for a given mini-batch. Across CIFAR-10, CIFAR-100, WMT14 EN-DE/EN-FR translation, and out-of-distribution tasks, BatchEnsemble yields competitive accuracy and uncertainties as typical ensembles; the speedup at test time is 3X and memory reduction is 3X at an ensemble of size 4. We also apply BatchEnsemble to lifelong learning, where on Split-CIFAR-100, BatchEnsemble yields comparable performance to progressive neural networks while having a much lower computational and memory costs. We further show that BatchEnsemble can easily scale up to lifelong learning on Split-ImageNet which involves 100 sequential learning tasks.