xRFM: Accurate, scalable, and interpretable feature learning models for tabular data
Beaglehole, Daniel, Holzmüller, David, Radhakrishnan, Adityanarayanan, Belkin, Mikhail
Tabular data - collections of continuous and categorical variables organized into matrices - underlies all aspects of modern commerce and science from airplane engines to biology labs to bagel shops. Yet, while Machine Learning and AI for language and vision have seen unprecedented progress, the primary methodologies of prediction from tabular data have been relatively static, dominated by variations of Gradient Boosted Decision Trees (GBDTs), such as XGBoost [7]. Nevertheless, hundreds of tabular datasets have been assembled to form extensive regression and classification benchmarks [11, 12, 16, 35, 37], and, recently, there has been renewed interest in building state-of-the-art predictive models for tabular data [15, 18, 19]. Notably, given the remarkable effectiveness of large, "foundation" models for text, there has been much excitement in developing similar models on tabular data, and recent effort has led to the development of TabPFN-v2, a foundation model for tabular data appearing in Nature [18]. Yet, despite this progress, tabular data still remains an active area for model development and building scalable, effective, and interpretable machine learning models in this domain remains an open challenge. In this work, we introduce xRFM, a tabular predictive model that combines recent advances in feature learning kernel machines with an adaptive tree structure, making it effective, scalable, and interpretable.
Aug-15-2025
- Country:
- Genre:
- Research Report (1.00)
- Industry:
- Government > Regional Government
- Health & Medicine (0.68)
- Transportation (0.66)
- Technology: