mabsplit
Appendix 1 Proofs
Let B denote the batch size chosen for MABSplit. Note that there are at mostnB rounds in the main while loop (Line 6) of Algorithm 1 and hence at mostnmTB nmT confidence intervals computed across all arms and all steps of the algorithm. Since the mainwhile loop in the algorithm can only run nB times, the algorithm must terminate. Furthermore, if all confidence intervals throughout the algorithm are correct, itisimpossible for(f,t)tobe removed from the set ofcandidate arms. Finally, we consider the complexity of Algorithm 1. Letnused be the total number of arm pulls computed for each arm remaining in the set of candidate arms at a given point in the algorithm.
- North America > United States > California (0.06)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Illinois (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
MABSplit: Faster Forest Training Using Multi-Armed Bandits
Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based learning methods. At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points when constructing decision trees. Our algorithm borrows techniques from the multi-armed bandit literature to judiciously determine how to allocate samples and computational power across candidate split points. We provide theoretical guarantees that MABSplit improves the sample complexity of each node split from linear to logarithmic in the number of data points. In some settings, MABSplit leads to 100x faster training (an 99% reduction in training time) without any decrease in generalization performance. We demonstrate similar speedups when MABSplit is used across a variety of forest-based variants, such as Extremely Random Forests and Random Patches. We also show our algorithm can be used in both classification and regression tasks. Finally, we show that MABSplit outperforms existing methods in generalization performance and feature importance calculations under a fixed computational budget.
Appendix 1 Proofs
In this section, we present the proof of Theorem 1. Proof. Next, we prove the correctness of Algorithm 1. This proves the correctness of Algorithm 1. Finally, we consider the complexity of Algorithm 1. In Theorem 1, we demonstrated that MABSplit scales logarithmically in dataset size. Appendix Figure 1 (a) demonstrates the number of data points queried by MABSplit for a single node split, i.e., a single call to MABSplit, as the dataset size increases, for various subset sizes of For each sample size, a sample is drawn with replacement from the original MNIST dataset.
- North America > United States > California (0.05)
- Asia > China > Beijing > Beijing (0.05)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois (0.04)
- (4 more...)
- Information Technology > Information Management (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.96)
- Information Technology > Data Science > Data Mining > Big Data (0.69)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
MABSplit: Faster Forest Training Using Multi-Armed Bandits
Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based learning methods. At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points when constructing decision trees. Our algorithm borrows techniques from the multi-armed bandit literature to judiciously determine how to allocate samples and computational power across candidate split points. We provide theoretical guarantees that MABSplit improves the sample complexity of each node split from linear to logarithmic in the number of data points.
MABSplit: Faster Forest Training Using Multi-Armed Bandits
Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based learning methods. At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points when constructing decision trees. Our algorithm borrows techniques from the multi-armed bandit literature to judiciously determine how to allocate samples and computational power across candidate split points. We provide theoretical guarantees that MABSplit improves the sample complexity of each node split from linear to logarithmic in the number of data points.
MABSplit: Faster Forest Training Using Multi-Armed Bandits
Tiwari, Mo, Kang, Ryan, Lee, Je-Yong, Thrun, Sebastian, Piech, Chris, Shomorony, Ilan, Zhang, Martin Jinye
Random forests are some of the most widely used machine learning models today, especially in domains that necessitate interpretability. We present an algorithm that accelerates the training of random forests and other popular tree-based learning methods. At the core of our algorithm is a novel node-splitting subroutine, dubbed MABSplit, used to efficiently find split points when constructing decision trees. Our algorithm borrows techniques from the multi-armed bandit literature to judiciously determine how to allocate samples and computational power across candidate split points. We provide theoretical guarantees that MABSplit improves the sample complexity of each node split from linear to logarithmic in the number of data points. In some settings, MABSplit leads to 100x faster training (an 99% reduction in training time) without any decrease in generalization performance. We demonstrate similar speedups when MABSplit is used across a variety of forest-based variants, such as Extremely Random Forests and Random Patches. We also show our algorithm can be used in both classification and regression tasks. Finally, we show that MABSplit outperforms existing methods in generalization performance and feature importance calculations under a fixed computational budget.
- Asia > China > Beijing > Beijing (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Illinois (0.04)
- (4 more...)