This post was co-written with Baptiste Rocca. This old saying expresses pretty well the underlying idea that rules the very powerful "ensemble methods" in machine learning. Roughly, ensemble learning methods, that often trust the top rankings of many machine learning competitions (including Kaggle's competitions), are based on the hypothesis that combining multiple models together can often produce a much more powerful model. The purpose of this post is to introduce various notions of ensemble learning. We will give the reader some necessary keys to well understand and use related methods and be able to design adapted solutions when needed.
Ensemble methods are meta-algorithms that combine several machine learning techniques into one predictive model in order to decrease variance (bagging), bias (boosting), or improve predictions (stacking). Most ensemble methods use a single base learning algorithm to produce homogeneous base learners, i.e. learners of the same type, leading to homogeneous ensembles. There are also some methods that use heterogeneous learners, i.e. learners of different types, leading to heterogeneous ensembles. In order for ensemble methods to be more accurate than any of its individual members, the base learners have to be as accurate as possible and as diverse as possible. Bagging stands for bootstrap aggregation.
To expand: according to my naive understanding, boosted trees are basically an ensemble of decision trees which are fit sequentially so that each new tree makes up for the errors of the previously existing set of trees. The model is "boosted" by focusing new additions on correcting the residual errors of the last version of the model. The "gradient" comes in afterward as the parameters of the tree ensemble are optimized to minimize the error of the whole "base learner". I think of this as fine tuning of the boosted tree ensemble using a gradient-based optimization.
Model inversion (MI), where an adversary abuses access to a trained Machine Learning (ML) model in order to infer sensitive information about the model's original training data, has gotten a lot of attention in recent years. The trained model under assault is frequently frozen during MI and used to direct the training of a generator, such as a Generative Adversarial Network, to rebuild the distribution of the model's original training data. As a result, scrutiny of the capabilities of MI techniques is essential for the creation of appropriate protection techniques. Reconstruction of training data with high quality using a single model is complex. However, existing MI literature does not consider targeting many models simultaneously, which could offer the adversary extra information and viewpoints.
Mixture of experts is an ensemble learning technique developed in the field of neural networks. It involves decomposing predictive modeling tasks into sub-tasks, training an expert model on each, developing a gating model that learns which expert to trust based on the input to be predicted, and combines the predictions. Although the technique was initially described using neural network experts and gating models, it can be generalized to use models of any type. As such, it shows a strong similarity to stacked generalization and belongs to the class of ensemble learning methods referred to as meta-learning. In this tutorial, you will discover the mixture of experts approach to ensemble learning.