TabDPT: Scaling Tabular Foundation Models

Ma, Junwei, Thomas, Valentin, Hosseinzadeh, Rasa, Kamkari, Hamidreza, Labach, Alex, Cresswell, Jesse C., Golestan, Keyvan, Yu, Guangwei, Volkovs, Maksims, Caterini, Anthony L.

arXiv.org Machine Learning 

The challenges faced by neural networks on tabular data are well-documented and have hampered the progress of tabular foundation models. Techniques leveraging in-context learning (ICL) have shown promise here, allowing for dynamic adaptation to unseen data. ICL can provide predictions for entirely new datasets without further training or hyperparameter tuning, therefore providing very fast inference when encountering a novel task. However, scaling ICL for tabular data remains an issue: approaches based on large language models cannot efficiently process numeric tables, and tabular-specific techniques have not been able to effectively harness the power of real data to improve performance and generalization. We are able to overcome these challenges by training tabular-specific ICL-based architectures on real data with self-supervised learning and retrieval, combining the best of both worlds. Our resulting model - the Tabular Discriminative Pre-trained Transformer (TabDPT) - achieves state-of-the-art performance on the CC18 (classification) and CTR23 (regression) benchmarks with no task-specific fine-tuning, demonstrating the adapatability and speed of ICL once the model is pre-trained. TabDPT also demonstrates strong scaling as both model size and amount of available data increase, pointing towards future improvements simply through the curation of larger tabular pre-training datasets and training larger models. Details are in Section 5.2. These approaches have demonstrated the practical ability to more gracefully handle the idiosyncrasies of tabular data, although they require costly rounds of training and hyperparameter tuning on each new dataset to achieve good results. Indeed, it is unlikely that tree-based models will ever provide training-free generalization to unseen data - which we have grown to expect of foundation models in other domains - and as such we continue to pursue neural approaches, despite the current challenges. In-context learning (ICL) - referring to the phenomenon where a model generalizes to new tasks using only in-context template examples with no additional fine-tuning - is one avenue showing promise in building neural networks that can dynamically adapt to input data. ICL was first observed in large language models (LLMs) (Brown et al., 2020), which have even demonstrated some ability to perform inference on smaller tabular datasets (Han et al., 2024; Gardner et al., 2024). Since tables are not text, though, it is challenging to apply LLMs to tabular data. The cell-based, textual tokenization in particular is highly inefficient and makes context size a major limitation (Fang et al., 2024). This has hindered the adoption of LLM-based ICL techniques in practical tabular settings.