To Bag is to Prune

Sep-14-2020–arXiv.org Machine Learning

It is notoriously hard to build a bad Random Forest (RF). Concurrently, RF is perhaps the only standard ML algorithm that blatantly overfits in-sample without any consequence out-of-sample. Standard arguments cannot rationalize this paradox. I propose a new explanation: bootstrap aggregation and model perturbation as implemented by RF automatically prune a (latent) true underlying tree. More generally, there is no need to tune the stopping point of a properly randomized ensemble of greedily optimized base learners. Thus, Boosting and MARS are eligible for automatic (implicit) tuning. I empirically demonstrate the property, with simulated and real data, by reporting that these new completely overfitting ensembles yield an out-of-sample performance equivalent to that of their tuned counterparts -- or better.

banking & finance, decision tree learning, randomization, (20 more...)

arXiv.org Machine Learning

Sep-14-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada > Ontario
    - Toronto (0.14)
  - United States > Pennsylvania (0.14)

Genre:
- Research Report (1.00)

Industry:
- Banking & Finance > Economy (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Decision Tree Learning (0.67)
  - Ensemble Learning (1.00)
  - Statistical Learning > Regression (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found