ControlBurn: Nonlinear Feature Selection with Sparse Tree Ensembles
Liu, Brian, Xie, Miaolan, Yang, Haoyue, Udell, Madeleine
ControlBurn is a Python package to construct feature-sparse tree ensembles that support nonlinear feature selection and interpretable machine learning. The algorithms in this package first build large tree ensembles that prioritize basis functions with few features and then select a feature-sparse subset of these basis functions using a weighted lasso optimization criterion. The package includes visualizations to analyze the features selected by the ensemble and their impact on predictions. Hence ControlBurn offers the accuracy and flexibility of tree-ensemble models and the interpretability of sparse generalized additive models. ControlBurn is scalable and flexible: for example, it can use warm-start continuation to compute the regularization path (prediction error for any number of selected features) for a dataset with tens of thousands of samples and hundreds of features in seconds. For larger datasets, the runtime scales linearly in the number of samples and features (up to a log factor), and the package support acceleration using sketching. Moreover, the ControlBurn framework accommodates feature costs, feature groupings, and $\ell_0$-based regularizers. The package is user-friendly and open-source: its documentation and source code appear on https://pypi.org/project/ControlBurn/ and https://github.com/udellgroup/controlburn/.
Jul-8-2022
- Country:
- Pacific Ocean > North Pacific Ocean
- San Francisco Bay (0.04)
- North America > United States
- Wisconsin (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- California
- San Francisco County > San Francisco (0.04)
- San Diego County > San Diego (0.04)
- Los Angeles County > Los Angeles (0.04)
- Pacific Ocean > North Pacific Ocean
- Genre:
- Research Report (0.50)
- Industry:
- Technology: