LLM Embeddings for Deep Learning on Tabular Data
Koloski, Boshko, Margeloiu, Andrei, Jiang, Xiangjian, Škrlj, Blaž, Simidjievski, Nikola, Jamnik, Mateja
–arXiv.org Artificial Intelligence
Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.
arXiv.org Artificial Intelligence
Feb-17-2025
- Country:
- North America > United States (0.29)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.14)
- Genre:
- Overview > Innovation (0.34)
- Research Report
- Promising Solution (0.48)
- New Finding (0.46)
- Industry:
- Health & Medicine > Therapeutic Area (0.73)
- Education > Educational Setting (0.68)
- Technology: