AITopics | relational database

Collaborating Authors

relational database

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

SM3-Text-to-Query: Synthetic Multi-Model Medical Text-to-Query Benchmark

Neural Information Processing SystemsMar-21-2026, 20:47:02 GMT

Electronic health records (EHRs) are stored in various database systems with different database models on heterogeneous storage architectures, such as relational databases, document stores, or graph databases. These different database models have a big impact on query complexity and performance. While this has been a known fact in database research, its implications for the growing number of Text-to-Query systems have surprisingly not been investigated so far.In this paper, we present SM3-Text-to-Query, the first multi-model medical Text-to-Query benchmark based on synthetic patient data from Synthea, following the SNOMED-CT taxonomy---a widely used knowledge graph ontology covering medical terminology. SM3-Text-to-Query provides data representations for relational databases (PostgreSQL), document stores (MongoDB), and graph databases (Neo4j and GraphDB (RDF)), allowing the evaluation across four popular query languages, namely SQL, MQL, Cypher, and SPARQL.We systematically and manually develop 408 template questions, which we augment to construct a benchmark of 10K diverse natural language question/query pairs for these four query languages (40K pairs overall). On our dataset, we evaluate several common in-context-learning (ICL) approaches for a set of representative closed and open-source LLMs.Our evaluation sheds light on the trade-offs between database models and query languages for different ICL strategies and LLMs. Last,SM3-Text-to-Query is easily extendable to additional query languages or real, standard-based patient databases.

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Industry: Health & Medicine > Health Care Technology > Medical Record (0.59)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.59)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.50)

Add feedback

ERBench: An Entity-Relationship based Automatically Verifiable Hallucination Benchmark for Large Language Models

Neural Information Processing SystemsMar-20-2026, 20:47:09 GMT

Large language models (LLMs) have achieved unprecedented performances in various applications, yet evaluating them is still challenging. Existing benchmarks are either manually constructed or are automatic, but lack the ability to evaluate the thought process of LLMs with arbitrary complexity. We contend that utilizing existing relational databases based on the entity-relationship (ER) model is a promising approach for constructing benchmarks as they contain structured knowledge that can be used to question LLMs. Unlike knowledge graphs, which are also used to evaluate LLMs, relational databases have integrity constraints that can be used to better construct complex in-depth questions and verify answers: (1) functional dependencies can be used to pinpoint critical keywords that an LLM must know to properly answer a given question containing certain attribute values; and (2) foreign key constraints can be used to join relations and construct multi-hop questions, which can be arbitrarily long and used to debug intermediate answers. We thus propose ERBench, which uses these integrity constraints to convert any database into an LLM benchmark. ERBench supports continuous evaluation as databases change, multimodal questions, and various prompt engineering techniques. In our experiments, we construct LLM benchmarks using databases of multiple domains and make an extensive comparison of contemporary LLMs. We show how ERBench can properly evaluate any LLM by not only checking for answer correctness, but also effectively verifying the rationales by looking for the right keywords.

artificial intelligence, large language model, natural language, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.59)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

WikiDBs: A Large-Scale Corpus Of Relational Databases From Wikidata

Neural Information Processing SystemsMar-20-2026, 06:12:42 GMT

Deep learning on tabular data, and particularly tabular representation learning, has recently gained growing interest. However, representation learning for relational databases with multiple tables is still an underexplored area, which may be attributed to the lack of openly available resources. To support the development of foundation models for tabular data and relational databases, we introduce WikiDBs, a novel open-source corpus of 100,000 relational databases. Each database consists of multiple tables connected by foreign keys. The corpus is based on Wikidata and aims to follow certain characteristics of real-world databases. In this paper, we describe the dataset and our method for creating it. By making our code publicly available, we enable others to create tailored versions of the dataset, for example, by creating databases in different languages. Finally, we conduct a set of initial experiments to showcase how WikiDBs can be used to train for data engineering tasks, such as missing value imputation and column type annotation.

artificial intelligence, database, machine learning, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Machine Learning (0.41)

Add feedback

RelBench: A Benchmark for Deep Learning on Relational Databases

Neural Information Processing SystemsMar-19-2026, 04:08:37 GMT

RelBench provides databases and tasks spanning diverse domains, scales, and database dimensions, and is intended to be a foundational infrastructure for future research in this direction. We use RelBench to conduct the first comprehensive empirical study of graph neural network (GNN) based predictive models on relational data, as recently proposed by Fey et al. 2024. End-to-end learned GNNs are capable fully exploiting the predictive signal encoded in links between entities, marking a significant shift away from the dominant paradigm of manual feature engineering combined with tabular machine learning. To thoroughly evaluate GNNs against the prior gold-standard we conduct a user study, where an experienced data scientist manually engineers features for each task. In this study, GNNs learn better models whilst reducing human work needed by more than an order of magnitude. This result demonstrates the power of GNNs for solving predictive tasks in relational databases, opening up new research opportunities.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.60)

Technology:

Information Technology > Databases (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.43)

Add feedback

4847447cae911dde321ca19afe7d8707-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-12-2026, 10:01:29 GMT

data mining, large language model, machine learning, (23 more...)

Neural Information Processing Systems

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
Europe > Czechia > Prague (0.04)
(18 more...)

Genre:

Overview (0.68)
Research Report (0.46)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
(4 more...)

Add feedback

4DBInfer: A4D Benchmarking Toolbox for Graph-Centric Predictive Modeling on RDBs Minjie Wang

Neural Information Processing SystemsFeb-10-2026, 15:47:51 GMT

Given a relational database (RDB), how can we predict missing column values in some target table of interest? Although RDBs store vast amounts of rich, informative data spread across interconnected tables, the progress of predictive machine learning models as applied to such tasks arguably falls well behind advances in other domains such as computer vision or natural language processing. This deficit stems, at least in part, from the lack of established/public RDB benchmarks as needed for training and evaluation purposes. As a result, related model development thus far often defaults to tabular approaches trained on ubiquitous single-table benchmarks, or on the relational side, graph-based alternatives such as GNNs applied to a completely different set of graph datasets devoid of tabular characteristics. To more precisely target RDBs lying at the nexus of these two complementary regimes, we explore a broad class of baseline models predicated on: (i) converting multi-table datasets into graphs using various strategies equipped with efficient subsampling, while preserving tabular characteristics; and (ii) trainable models with well-matched inductive biases that output predictions based on these input subgraphs. Then, to address the dearth of suitable public benchmarks and reduce siloed comparisons, we assemble a diverse collection of (i) large-scale RDB datasets and (ii) coincident predictive tasks.

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

Europe > Czechia > Prague (0.04)
Asia > China > Hong Kong (0.04)
North America > Canada > British Columbia > Vancouver (0.04)
(5 more...)

Genre: Research Report > New Finding (0.45)

Industry: Information Technology (0.92)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

25cd345233c65fac1fec0ce61d0f7836-Paper-Datasets_and_Benchmarks_Track.pdf

Neural Information Processing SystemsFeb-9-2026, 16:15:59 GMT

data scientist, database, relational database, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Asia > China > Liaoning Province > Shenyang (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Ontology Learning and Knowledge Graph Construction: A Comparison of Approaches and Their Impact on RAG Performance

da Cruz, Tiago, Tavares, Bernardo, Belo, Francisco

arXiv.org Artificial IntelligenceNov-11-2025

Retrieval-Augmented Generation (RAG) systems combine Large Language Models (LLMs) with external knowledge, and their performance depends heavily on how that knowledge is represented. This study investigates how different Knowledge Graph (KG) construction strategies influence RAG performance. We compare a variety of approaches: standard vector-based RAG, GraphRAG, and retrieval over KGs built from ontologies derived either from relational databases or textual corpora. Results show that ontology-guided KGs incorporating chunk information achieve competitive performance with state-of-the-art frameworks, substantially outperforming vector retrieval baselines. Moreover, the findings reveal that ontology-guided KGs built from relational databases perform competitively to ones built with ontologies extracted from text, with the benefit of offering a dual advantage: they require a one-time-only ontology learning process, substantially reducing LLM usage costs; and avoid the complexity of ontology merging inherent to text-based approaches.

artificial intelligence, machine learning, natural language, (13 more...)

arXiv.org Artificial Intelligence

2511.05991

Genre: Research Report > New Finding (1.00)

Industry: Construction & Engineering (0.35)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

Integrating Temporal and Structural Context in Graph Transformers for Relational Deep Learning

Lachi, Divyansha, Mohammadi, Mahmoud, Meyer, Joe, Arora, Vinam, Palczewski, Tom, Dyer, Eva L.

arXiv.org Artificial IntelligenceNov-7-2025

In domains such as healthcare, finance, and e-commerce, the temporal dynamics of relational data emerge from complex interactions-such as those between patients and providers, or users and products across diverse categories. To be broadly useful, models operating on these data must integrate long-range spatial and temporal dependencies across diverse types of entities, while also supporting multiple predictive tasks. However, existing graph models for relational data primarily focus on spatial structure, treating temporal information merely as a filtering constraint to exclude future events rather than a modeling signal, and are typically designed for single-task prediction. To address these gaps, we introduce a temporal subgraph sampler that enhances global context by retrieving nodes beyond the immediate neighborhood to capture temporally relevant relationships. In addition, we propose the Relational Graph Perceiver (RGP), a graph transformer architecture for relational deep learning that leverages a cross-attention-based latent bottleneck to efficiently integrate information from both structural and temporal contexts. This latent bottleneck integrates signals from different node and edge types into a common latent space, enabling the model to build global context across the entire relational system. RGP also incorporates a flexible cross-attention decoder that supports joint learning across tasks with disjoint label spaces within a single model. Experiments on RelBench, SALT, and CTU show that RGP delivers state-of-the-art performance, offering a general and scalable solution for relational deep learning with support for diverse predictive tasks.

artificial intelligence, integrating temporal and structural context, machine learning, (12 more...)

arXiv.org Artificial Intelligence

2511.04557

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Services (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RDB2G-Bench: A Comprehensive Benchmark for Automatic Graph Modeling of Relational Databases

Choi, Dongwon, Kim, Sunwoo, Kim, Juyeon, Kim, Kyungho, Lee, Geon, Kang, Shinhwan, Kim, Myunghwan, Shin, Kijung

arXiv.org Artificial IntelligenceOct-29-2025

Recent advances have demonstrated the effectiveness of graph-based learning on relational databases (RDBs) for predictive tasks. Such approaches require transforming RDBs into graphs, a process we refer to as RDB-to-graph modeling, where rows of tables are represented as nodes and foreign-key relationships as edges. Yet, effective modeling of RDBs into graphs remains challenging. Specifically, there exist numerous ways to model RDBs into graphs, and performance on predictive tasks varies significantly depending on the chosen graph model of RDBs. In our analysis, we find that the best-performing graph model can yield up to a 10% higher performance compared to the common heuristic rule for graph modeling, which remains non-trivial to identify. To foster research on intelligent RDB-to-graph modeling, we introduce RDB2G-Bench, the first benchmark framework for evaluating such methods. We construct extensive datasets covering 5 real-world RDBs and 12 predictive tasks, resulting in around 50k graph model-performance pairs for efficient and reproducible evaluations. Thanks to our precomputed datasets, we were able to benchmark 10 automatic RDB-to-graph modeling methods on the 12 tasks about 380x faster than on-the-fly evaluation, which requires repeated GNN training. Our analysis of the datasets and benchmark results reveals key structural patterns affecting graph model effectiveness, along with practical implications for effective graph modeling. Our datasets and code are available at https://github.com/chlehdwon/RDB2G-Bench.

data mining, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2506.0136

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.96)
(3 more...)

Add feedback