Golestan, Keyvan
DRESS: Disentangled Representation-based Self-Supervised Meta-Learning for Diverse Tasks
Cui, Wei, Wu, Tongzi, Cresswell, Jesse C., Sui, Yi, Golestan, Keyvan
Meta-learning represents a strong class of approaches for solving few-shot learning tasks. Nonetheless, recent research suggests that simply pre-training a generic encoder can potentially surpass meta-learning algorithms. In this paper, we first discuss the reasons why meta-learning fails to stand out in these few-shot learning experiments, and hypothesize that it is due to the few-shot learning tasks lacking diversity. We propose DRESS, a task-agnostic Disentangled REpresentation-based Self-Supervised meta-learning approach that enables fast model adaptation on highly diversified few-shot learning tasks. Specifically, DRESS utilizes disentangled representation learning to create self-supervised tasks that can fuel the meta-training process. Furthermore, we also propose a class-partition based metric for quantifying the task diversity directly on the input space. We validate the effectiveness of DRESS through experiments on datasets with multiple factors of variation and varying complexity. The results suggest that DRESS is able to outperform competing methods on the majority of the datasets and task setups. Through this paper, we advocate for a re-examination of proper setups for task adaptation studies, and aim to reignite interest in the potential of meta-learning for solving few-shot learning tasks via disentangled representations.
TabDPT: Scaling Tabular Foundation Models
Ma, Junwei, Thomas, Valentin, Hosseinzadeh, Rasa, Kamkari, Hamidreza, Labach, Alex, Cresswell, Jesse C., Golestan, Keyvan, Yu, Guangwei, Volkovs, Maksims, Caterini, Anthony L.
The challenges faced by neural networks on tabular data are well-documented and have hampered the progress of tabular foundation models. Techniques leveraging in-context learning (ICL) have shown promise here, allowing for dynamic adaptation to unseen data. ICL can provide predictions for entirely new datasets without further training or hyperparameter tuning, therefore providing very fast inference when encountering a novel task. However, scaling ICL for tabular data remains an issue: approaches based on large language models cannot efficiently process numeric tables, and tabular-specific techniques have not been able to effectively harness the power of real data to improve performance and generalization. We are able to overcome these challenges by training tabular-specific ICL-based architectures on real data with self-supervised learning and retrieval, combining the best of both worlds. Our resulting model - the Tabular Discriminative Pre-trained Transformer (TabDPT) - achieves state-of-the-art performance on the CC18 (classification) and CTR23 (regression) benchmarks with no task-specific fine-tuning, demonstrating the adapatability and speed of ICL once the model is pre-trained. TabDPT also demonstrates strong scaling as both model size and amount of available data increase, pointing towards future improvements simply through the curation of larger tabular pre-training datasets and training larger models. Details are in Section 5.2. These approaches have demonstrated the practical ability to more gracefully handle the idiosyncrasies of tabular data, although they require costly rounds of training and hyperparameter tuning on each new dataset to achieve good results. Indeed, it is unlikely that tree-based models will ever provide training-free generalization to unseen data - which we have grown to expect of foundation models in other domains - and as such we continue to pursue neural approaches, despite the current challenges. In-context learning (ICL) - referring to the phenomenon where a model generalizes to new tasks using only in-context template examples with no additional fine-tuning - is one avenue showing promise in building neural networks that can dynamically adapt to input data. ICL was first observed in large language models (LLMs) (Brown et al., 2020), which have even demonstrated some ability to perform inference on smaller tabular datasets (Han et al., 2024; Gardner et al., 2024). Since tables are not text, though, it is challenging to apply LLMs to tabular data. The cell-based, textual tokenization in particular is highly inefficient and makes context size a major limitation (Fang et al., 2024). This has hindered the adoption of LLM-based ICL techniques in practical tabular settings.
Retrieval & Fine-Tuning for In-Context Tabular Models
Thomas, Valentin, Ma, Junwei, Hosseinzadeh, Rasa, Golestan, Keyvan, Yu, Guangwei, Volkovs, Maksims, Caterini, Anthony
Tabular data is a pervasive modality spanning a wide range of domains, and the inherent diversity poses a considerable challenge for deep learning. Recent advancements using transformer-based in-context learning have shown promise on smaller and less complex datasets, but have struggled to scale to larger and more complex ones. To address this limitation, we propose a combination of retrieval and fine-tuning: we can adapt the transformer to a local subset of the data by collecting nearest neighbours, and then perform task-specific fine-tuning with this retrieved set of neighbours in context. Using TabPFN as the base model -- currently the best tabular in-context learner -- and applying our retrieval and fine-tuning scheme on top results in what we call a locally-calibrated PFN, or LoCalPFN. We conduct extensive evaluation on 95 datasets curated by TabZilla from OpenML, upon which we establish a new state-of-the-art with LoCalPFN -- even with respect to tuned tree-based models. Notably, we show a significant boost in performance compared to the base in-context model, demonstrating the efficacy of our approach and advancing the frontier of deep learning in tabular data.
Tabular Data Contrastive Learning via Class-Conditioned and Feature-Correlation Based Augmentation
Cui, Wei, Hosseinzadeh, Rasa, Ma, Junwei, Wu, Tongzi, Sui, Yi, Golestan, Keyvan
Contrastive learning is a model pre-training technique by first creating similar views of the original data, and then encouraging the data and its corresponding views to be close in the embedding space. Contrastive learning has witnessed success in image and natural language data, thanks to the domain-specific augmentation techniques that are both intuitive and effective. Nonetheless, in tabular domain, the predominant augmentation technique for creating views is through corrupting tabular entries via swapping values, which is not as sound or effective. We propose a simple yet powerful improvement to this augmentation technique: corrupting tabular data conditioned on class identity. Specifically, when corrupting a specific tabular entry from an anchor row, instead of randomly sampling a value in the same feature column from the entire table uniformly, we only sample from rows that are identified to be within the same class as the anchor row. We assume the semi-supervised learning setting, and adopt the pseudo labeling technique for obtaining class identities over all table rows. We also explore the novel idea of selecting features to be corrupted based on feature correlation structures. Extensive experiments show that the proposed approach consistently outperforms the conventional corruption method for tabular data classification tasks. Our code is available at https://github.com/willtop/Tabular-Class-Conditioned-SSL.