Retrieval & Fine-Tuning for In-Context Tabular Models

Thomas, Valentin, Ma, Junwei, Hosseinzadeh, Rasa, Golestan, Keyvan, Yu, Guangwei, Volkovs, Maksims, Caterini, Anthony

Jun-7-2024–arXiv.org Artificial Intelligence

Tabular data is a pervasive modality spanning a wide range of domains, and the inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex datasets, but have struggled to scale to larger and more complex ones. To address this limitation, we propose a combination of retrieval and fine-tuning: we can adapt the transformer to a local subset of the data by collecting nearest neighbours, and then perform task-specific fine-tuning with this retrieved set of neighbours in context. Using TabPFN as the base model -- currently the best tabular in-context learner -- and applying our retrieval and fine-tuning scheme on top results in what we call a locally-calibrated PFN, or LoCalPFN. We conduct extensive evaluation on 95 datasets curated by TabZilla from OpenML, upon which we establish a new state-of-the-art with LoCalPFN -- even with respect to tuned tree-based models. Notably, we show a significant boost in performance compared to the base in-context model, demonstrating the efficacy of our approach and advancing the frontier of deep learning in tabular data.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Jun-7-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.93)

Industry:
- Health & Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (1.00)
  - Natural Language (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found