LLM Embeddings for Deep Learning on Tabular Data

Koloski, Boshko, Margeloiu, Andrei, Jiang, Xiangjian, Škrlj, Blaž, Simidjievski, Nikola, Jamnik, Mateja

Feb-17-2025–arXiv.org Artificial Intelligence

Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Feb-17-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.29)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.14)

Genre:
- Overview > Innovation (0.34)
- Research Report
  - Promising Solution (0.48)
  - New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area (0.73)
- Education > Educational Setting (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found