Autoencoding Random Forests
Vu, Binh Duc, Kapar, Jan, Wright, Marvin, Watson, David S.
We propose a principled method for autoencoding with random forests. Our strategy builds on foundational results from nonparametric statistics and spectral graph theory to learn a low-dimensional embedding of the model that optimally represents relationships in the data. We provide exact and approximate solutions to the decoding problem via constrained optimization, split relabeling, and nearest neighbors regression. These methods effectively invert the compression pipeline, establishing a map from the embedding space back to the input space using splits learned by the ensemble's constituent trees. The resulting decoders are universally consistent under common regularity assumptions. The procedure works with supervised or unsupervised models, providing a window into conditional or joint distributions. We demonstrate various applications of this autoencoder, including powerful new tools for visualization, compression, clustering, and denoising. Experiments illustrate the ease and utility of our method in a wide range of settings, including tabular, image, and genomic data.
May-28-2025
- Country:
- Asia > Singapore (0.04)
- Antarctica (0.04)
- North America > United States
- Massachusetts
- Middlesex County > Cambridge (0.04)
- Suffolk County > Boston (0.04)
- Florida > Palm Beach County
- Boca Raton (0.04)
- Massachusetts
- Europe
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- Portugal > Castelo Branco
- Castelo Branco (0.04)
- Middle East > Malta
- Port Region > Southern Harbour District > Floriana (0.04)
- Germany > Bremen
- Bremen (0.04)
- United Kingdom > England
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Education (1.00)
- Health & Medicine
- Therapeutic Area (0.68)
- Pharmaceuticals & Biotechnology (0.48)
- Technology: