CARTE: Pretraining and Transfer for Tabular Learning

Kim, Myung Jun, Grinsztajn, Léo, Varoquaux, Gaël

May-31-2024–arXiv.org Artificial Intelligence

Pretrained deep-learning models are the go-to solution for images or text. However, for tabular data the standard is still to train tree-based models. Indeed, transfer learning on tables hits the challenge of data integration: finding correspondences, correspondences in the entries (entity matching) where different words may denote the same entity, correspondences across columns (schema matching), which may come in different orders, names... We propose a neural architecture that does not need such correspondences. As a result, we can pretrain it on background data that has not been matched. The architecture -- CARTE for Context Aware Representation of Table Entries -- uses a graph representation of tabular (or relational) data to process tables with different columns, string embedding of entries and columns names to model an open vocabulary, and a graph-attentional network to contextualize entries with column names and neighboring entries. An extensive benchmark shows that CARTE facilitates learning, outperforming a solid set of baselines including the best tree-based models. CARTE also enables joint learning across tables with unmatched columns, enhancing a small table with bigger ones. CARTE opens the door to large pretrained models for tabular data.

carte, information, pretraining and transfer, (15 more...)

arXiv.org Artificial Intelligence

May-31-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
  - Maryland > Montgomery County (0.04)
- Europe
  - Austria > Vienna (0.14)
  - France (0.04)
  - Italy (0.04)
  - Poland (0.04)
- Asia
  - Pakistan (0.04)
  - Middle East > Saudi Arabia (0.04)
  - India (0.04)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (0.67)

Industry:
- Health & Medicine (0.93)
- Leisure & Entertainment > Sports (0.93)
- Government > Regional Government
  - North America Government > United States Government (1.00)
- Consumer Products & Services > Food, Beverage, Tobacco & Cannabis
  - Beverages (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found