To Bag is to Prune
It is notoriously hard to build a bad Random Forest (RF). Concurrently, RF is perhaps the only standard ML algorithm that blatantly overfits in-sample without any consequence out-of-sample. Standard arguments cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a (latent) true underlying tree. More generally, there is no need to tune the stopping point of a properly randomized ensemble of greedily optimized base learners. Thus, Boosting and MARS are eligible for automatic (implicit) tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles yield an out-of-sample performance equivalent to that of their tuned counterparts -- or better.
Sep-14-2020
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States > Pennsylvania (0.14)
- Canada > Ontario
- North America
- Genre:
- Research Report (1.00)
- Industry:
- Banking & Finance > Economy (0.93)
- Technology: