Why In-Context Learning Transformers are Tabular Data Classifiers

Breejen, Felix den, Bae, Sangmin, Cha, Stephen, Yun, Se-Young

May-22-2024–arXiv.org Machine Learning

The recently introduced TabPFN pretrains an In-Context Learning (ICL) transformer on synthetic data to perform tabular data classification. As synthetic data does not share features or labels with real-world data, the underlying mechanism that contributes to the success of this method remains unclear. This study provides an explanation by demonstrating that ICL-transformers acquire the ability to create complex decision boundaries during pretraining. To validate our claim, we develop a novel forest dataset generator which creates datasets that are unrealistic, but have complex decision boundaries. Our experiments confirm the effectiveness of ICL-transformers pretrained on this data. Furthermore, we create TabForestPFN, the ICL-transformer pretrained on both the original TabPFN synthetic dataset generator and our forest dataset generator. By fine-tuning this model, we reach the current state-of-the-art on tabular data classification.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

May-22-2024

arXiv.org PDF

Add feedback

Country:
- North America > Canada > Alberta (0.14)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Health & Medicine > Therapeutic Area (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.68)
  - Statistical Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found