Code and Pixels: Multi-Modal Contrastive Pre-training for Enhanced Tabular Data Analysis
Roy, Kankana, Krämer, Lars, Domaschke, Sebastian, Haris, Malik, Aydin, Roland, Isensee, Fabian, Held, Martin
–arXiv.org Artificial Intelligence
Learning from tabular data is of paramount importance, as it complements the conventional analysis of image and video data by providing a rich source of structured information that is often critical for comprehensive understanding and decision-making processes. We present Multi-task Contrastive Masked Tabular Modeling (MT-CMTM), a novel method aiming to enhance tabular models by leveraging the correlation between tabular data and corresponding images. MT-CMTM employs a dual strategy combining contrastive learning with masked tabular modeling, optimizing the synergy between these data modalities. Central to our approach is a 1D Convolutional Neural Network with residual connections and an attention mechanism (1D-ResNet-CBAM), designed to efficiently process tabular data without relying on images. This enables MT-CMTM to handle purely tabular data for downstream tasks, eliminating the need for potentially costly image acquisition and processing. We evaluated MT-CMTM on the DVM car dataset, which is uniquely suited for this particular scenario, and the newly developed HIPMP dataset, which connects membrane fabrication parameters with image data. Our MT-CMTM model outperforms the proposed tabular 1D-ResNet-CBAM, which is trained from scratch, achieving a relative 1.48% improvement in relative MSE on HIPMP and a 2.38% increase in absolute accuracy on DVM. These results demonstrate MT-CMTM's robustness and its potential to advance the field of multi-modal learning.
arXiv.org Artificial Intelligence
Jan-13-2025
- Country:
- Europe (0.67)
- Genre:
- Research Report
- New Finding (0.66)
- Promising Solution (0.48)
- Research Report
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.46)
- Technology: