deep tree
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- (3 more...)
204da255aea2cd4a75ace6018fad6b4d-Paper.pdf
Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- (3 more...)
Reviews: When do random forests fail?
This is a fairly serious omission, and casual readers would remember the wrong conclusions. This must be fixed for publication, but I think it would be straightforward to fix. Officially, NIPS reviewers are not required to look at the supplementary material. Because of having only three weeks to review six manuscripts, I was not able to make the time during my reviewing. So I worry that publishing this work would mean publishing results without sufficient peer review. DETAILED COMMENTS * p. 1: I'm not sure it is accurate to say that deep, unsupervised trees grown with no subsampling is a common setup for learning random forests. It appears in Geurts et al. (2006) as a special case, sometimes in mass estimation [1, 2], and sometimes in Wei Fan's random decision tree papers [3-6]. I don't think these are used very much.
The Role of Depth, Width, and Tree Size in Expressiveness of Deep Forest
Lyu, Shen-Huan, Wu, Jin-Hui, Zheng, Qin-Cheng, Ye, Baoliu
Random forests are classical ensemble algorithms that construct multiple randomized decision trees and aggregate their predictions using naive averaging. \citet{zhou2019deep} further propose a deep forest algorithm with multi-layer forests, which outperforms random forests in various tasks. The performance of deep forests is related to three hyperparameters in practice: depth, width, and tree size, but little has been known about its theoretical explanation. This work provides the first upper and lower bounds on the approximation complexity of deep forests concerning the three hyperparameters. Our results confirm the distinctive role of depth, which can exponentially enhance the expressiveness of deep forests compared with width and tree size. Experiments confirm the theoretical findings.
- North America > United States (0.14)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Slow-Growing Trees
Random Forest's performance can be matched by a single slow-growing tree (SGT), which uses a learning rate to tame CART's greedy algorithm. SGT exploits the view that CART is an extreme case of an iterative weighted least square procedure. Moreover, a unifying view of Boosted Trees (BT) and Random Forests (RF) is presented. Greedy ML algorithms' outcomes can be improved using either "slow learning" or diversification. SGT applies the former to estimate a single deep tree, and Booging (bagging stochastic BT with a high learning rate) uses the latter with additive shallow trees. The performance of this tree ensemble quaternity (Booging, BT, SGT, RF) is assessed on simulated and real regression tasks.
- North America > United States > Ohio (0.04)
- North America > United States > California (0.04)
- North America > United States > Pennsylvania (0.04)
- (3 more...)
- Banking & Finance > Economy (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
When do random forests fail?
Tang, Cheng, Garreau, Damien, Luxburg, Ulrike von
Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- (3 more...)
When do random forests fail?
Tang, Cheng, Garreau, Damien, Luxburg, Ulrike von
Random forests are learning algorithms that build large collections of random trees and make predictions by averaging the individual tree predictions. In this paper, we consider various tree constructions and examine how the choice of parameters affects the generalization error of the resulting random forests as the sample size goes to infinity. We show that subsampling of data points during the tree construction phase is important: Forests can become inconsistent with either no subsampling or too severe subsampling. As a consequence, even highly randomized trees can lead to inconsistent forests if no subsampling is used, which implies that some of the commonly used setups for random forests can be inconsistent. As a second consequence we can show that trees that have good performance in nearest-neighbor search can be a poor choice for random forests.
- Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Monterey County > Monterey (0.04)
- (3 more...)