relbench
RelBench: A Benchmark for Deep Learning on Relational Databases
RelBench provides databases and tasks spanning diverse domains, scales, and database dimensions, and is intended to be a foundational infrastructure for future research in this direction. We use RelBench to conduct the first comprehensive empirical study of graph neural network (GNN) based predictive models on relational data, as recently proposed by Fey et al. 2024. End-to-end learned GNNs are capable fully exploiting the predictive signal encoded in links between entities, marking a significant shift away from the dominant paradigm of manual feature engineering combined with tabular machine learning. To thoroughly evaluate GNNs against the prior gold-standard we conduct a user study, where an experienced data scientist manually engineers features for each task. In this study, GNNs learn better models whilst reducing human work needed by more than an order of magnitude. This result demonstrates the power of GNNs for solving predictive tasks in relational databases, opening up new research opportunities.
RelBench: A Benchmark for Deep Learning on Relational Databases
RelBench provides databases and tasks spanning diverse domains, scales, and database dimensions, and is intended to be a foundational infrastructure for future research in this direction. We use RelBench to conduct the first comprehensive empirical study of graph neural network (GNN) based predictive models on relational data, as recently proposed by Fey et al. 2024. End-to-end learned GNNs are capable fully exploiting the predictive signal encoded in links between entities, marking a significant shift away from the dominant paradigm of manual feature engineering combined with tabular machine learning. To thoroughly evaluate GNNs against the prior gold-standard we conduct a user study, where an experienced data scientist manually engineers features for each task. In this study, GNNs learn better models whilst reducing human work needed by more than an order of magnitude.
Relational Graph Transformer
Dwivedi, Vijay Prakash, Jaladi, Sri, Shen, Yangyi, López, Federico, Kanatsoulis, Charilaos I., Puri, Rishi, Fey, Matthias, Leskovec, Jure
Relational Deep Learning (RDL) is a promising approach for building state-of-the-art predictive models on multi-table relational data by representing it as a heterogeneous temporal graph. However, commonly used Graph Neural Network models suffer from fundamental limitations in capturing complex structural patterns and long-range dependencies that are inherent in relational data. While Graph Transformers have emerged as powerful alternatives to GNNs on general graphs, applying them to relational entity graphs presents unique challenges: (i) Traditional positional encodings fail to generalize to massive, heterogeneous graphs; (ii) existing architectures cannot model the temporal dynamics and schema constraints of relational data; (iii) existing tokenization schemes lose critical structural information. Here we introduce the Relational Graph Transformer (RelGT), the first graph transformer architecture designed specifically for relational tables. RelGT employs a novel multi-element tokenization strategy that decomposes each node into five components (features, type, hop distance, time, and local structure), enabling efficient encoding of heterogeneity, temporality, and topology without expensive precomputation. Our architecture combines local attention over sampled subgraphs with global attention to learnable centroids, incorporating both local and database-wide representations. Across 21 tasks from the RelBench benchmark, RelGT consistently matches or outperforms GNN baselines by up to 18%, establishing Graph Transformers as a powerful architecture for Relational Deep Learning.
Tackling prediction tasks in relational databases with LLMs
Wydmuch, Marek, Borchmann, Łukasz, Graliński, Filip
Though large language models (LLMs) have demonstrated exceptional performance across numerous problems, their application to predictive tasks in relational databases remains largely unexplored. In this work, we address the notion that LLMs cannot yield satisfactory results on relational databases due to their interconnected tables, complex relationships, and heterogeneous data types. Using the recently introduced RelBench benchmark, we demonstrate that even a straightforward application of LLMs achieves competitive performance on these tasks. These findings establish LLMs as a promising new baseline for ML on relational databases and encourage further research in this direction.