LLM Embeddings for Deep Learning on Tabular Data
Koloski, Boshko, Margeloiu, Andrei, Jiang, Xiangjian, Škrlj, Blaž, Simidjievski, Nikola, Jamnik, Mateja
–arXiv.org Artificial Intelligence
Tabular deep-learning methods require embedding numerical and categorical input features into high-dimensional spaces before processing them. Existing methods deal with this heterogeneous nature of tabular data by employing separate type-specific encoding approaches. This limits the cross-table transfer potential and the exploitation of pre-trained knowledge. We propose a novel approach that first transforms tabular data into text, and then leverages pre-trained representations from LLMs to encode this data, resulting in a plug-and-play solution to improv ing deep-learning tabular methods. We demonstrate that our approach improves accuracy over competitive models, such as MLP, ResNet and FT-Transformer, by validating on seven classification datasets.
arXiv.org Artificial Intelligence
Feb-17-2025
- Country:
- Africa > Rwanda
- Asia
- Europe
- Croatia > Dubrovnik-Neretva County
- Dubrovnik (0.04)
- Middle East > Cyprus (0.04)
- Spain > Valencian Community
- Valencia Province > Valencia (0.04)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.14)
- Croatia > Dubrovnik-Neretva County
- North America > United States
- California > Los Angeles County
- Long Beach (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.05)
- California > Los Angeles County
- Oceania > Palau (0.04)
- Genre:
- Overview > Innovation (0.34)
- Research Report
- New Finding (0.46)
- Promising Solution (0.48)
- Industry:
- Education > Educational Setting (0.68)
- Health & Medicine > Therapeutic Area (0.73)
- Technology: