TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns

Onishi, Soma, Oono, Kenta, Hayashi, Kohei

arXiv.org Artificial Intelligence 

TabRet is designed to work on a downstream task that contains columns not seen in pre-training. Unlike other methods, TabRet has an extra learning step before fine-tuning called retokenizing, which calibrates feature embeddings based on the masked autoencoding loss. In experiments, we pre-trained TabRet with a large collection of public health surveys and fine-tuned it on classification tasks in healthcare, and TabRet achieved the best AUC performance on four datasets. In addition, an ablation study shows retokenizing and random shuffle augmentation of columns during pre-training contributed to performance gains. Transformer-based pre-trained models have been successfully applied to various domains such as text and images (Bommasani et al., 2021). The Transformer-like architecture consists of two modules: a tokenizer, which converts an input feature into a token embedding, and a mixer, which repeatedly manipulates the tokens with attention and Feed-Forward Networks (FFN) (Lin et al., 2021; Yu et al., 2022). During pre-training, both modules are trained to learn representations that generalize to downstream tasks. What has often been overlooked in the literature are scenarios where the input space change between pretext and downstream tasks. A supervised problem on tabular data is a typical example, where rows or records represent data points and columns represent input features. Since the data scale is not as large as text and images, pre-trained models are expected to be beneficial (Borisov et al., 2022).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found