AITopics | database system

Collaborating Authors

database system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ERROR 1365 (22012): Division b

Neural Information Processing SystemsJun-15-2026, 18:19:22 GMT

Large language models (LLMS) have shown increasing effectiveness in Textto-SQL tasks. However, another closely related problem, Cross-System SQL Translation (a.k.a., SQL-to-SQL), which adapts a query written for one database system (e.g., MySQL) into its equivalent one for another system (e.g., ClickHouse), is of great practical importance but remains underexplored. Existing SQL benchmarks are not well-suited for SQL-to-SQL evaluation, which (1) focus on a limited set of database systems (often just SQLite) and (2) cannot capture many system-specific SQL dialects (e.g., customized functions, data types, and syntax rules). Thus, in this paper, we introduce PARROT, a Practical And Realistic BenchmaRk for CrOss-System SQLTranslation. PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding (e.g., LLMS achieve lower than 38.53% accuracy on average). We also provide multiple benchmark variants, including PARROT-Diverse with 28,003 translations (for extensive syntax testing) and PARROT-Simple with 5,306 representative samples (for focused stress testing), covering 22 production-grade database systems.

large language model, machine learning, translation, (21 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

PARROT: A Benchmark for Evaluating LLMs in Cross-System SQL Translation

Neural Information Processing SystemsJun-11-2026, 07:14:25 GMT

Large language models (LLMs) have shown increasing effectiveness in Text-to-SQL tasks. However, another closely related problem, Cross-System SQL Translation (a.k.a., SQL-to-SQL), which adapts a query written for one database system (e.g., MySQL) into its equivalent one for another system (e.g., ClickHouse), is of great practical importance but remains underexplored. Existing SQL benchmarks are not well-suited for SQL-to-SQL evaluation, which (1) focus on a limited set of database systems (often just SQLite) and (2) cannot capture many system-specific SQL dialects (e.g., customized functions, data types, and syntax rules). Thus, in this paper, we introduce PARROT, a Practical And Realistic BenchmaRk for CrOss-System SQL Translation. PARROT comprises 598 translation pairs from 38 open-source benchmarks and real-world business services, specifically prepared to challenge system-specific SQL understanding (e.g., LLMS achieve lower than 38.53% accuracy on average). We also provide multiple benchmark variants, including PARROT-Diverse with 28,003 translation (for extensive syntax testing) and PARROT-Simple with 5,306 representative samples (for focused stress testing), covering 22 production-grade database systems.

artificial intelligence, large language model, natural language, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.85)

Add feedback

Fast Factorized Learning: Powered by In-Memory Database Systems

Stöckl, Bernhard, Schüle, Maximilian E.

arXiv.org Artificial IntelligenceDec-11-2025

Learning models over factorized joins avoids redundant computations by identifying and pre-computing shared cofactors. Previous work has investigated the performance gain when computing cofactors on traditional disk-based database systems. Due to the absence of published code, the experiments could not be reproduced on in-memory database systems. This work describes the implementation when using cofactors for in-database factorized learning. We benchmark our open-source implementation for learning linear regression on factorized joins with PostgreSQL -- as a disk-based database system -- and HyPer -- as an in-memory engine. The evaluation shows a performance gain of factorized learning on in-memory database systems by 70\% to non-factorized learning and by a factor of 100 compared to disk-based database systems. Thus, modern database engines can contribute to the machine learning pipeline by pre-computing aggregates prior to data extraction to accelerate training.

artificial intelligence, machine learning, relation, (15 more...)

arXiv.org Artificial Intelligence

2512.09836

Country:

North America > United States (0.68)
Europe (0.68)

Genre: Research Report (0.40)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.36)

Add feedback

DBAIOps: A Reasoning LLM-Enhanced Database Operation and Maintenance System using Knowledge Graphs

Zhou, Wei, Sun, Peng, Zhou, Xuanhe, Zang, Qianglei, Xu, Ji, Zhang, Tieying, Li, Guoliang, Wu, Fan

arXiv.org Artificial IntelligenceAug-5-2025

The operation and maintenance (O&M) of database systems is critical to ensuring system availability and performance, typically requiring expert experience (e.g., identifying metric-to-anomaly relations) for effective diagnosis and recovery. However, existing automatic database O&M methods, including commercial products, cannot effectively utilize expert experience. On the one hand, rule-based methods only support basic O&M tasks (e.g., metric-based anomaly detection), which are mostly numerical equations and cannot effectively incorporate literal O&M experience (e.g., troubleshooting guidance in manuals). On the other hand, LLM-based methods, which retrieve fragmented information (e.g., standard documents + RAG), often generate inaccurate or generic results. To address these limitations, we present DBAIOps, a novel hybrid database O&M system that combines reasoning LLMs with knowledge graphs to achieve DBA-style diagnosis. First, DBAIOps introduces a heterogeneous graph model for representing the diagnosis experience, and proposes a semi-automatic graph construction algorithm to build that graph from thousands of documents. Second, DBAIOps develops a collection of (800+) reusable anomaly models that identify both directly alerted metrics and implicitly correlated experience and metrics. Third, for each anomaly, DBAIOps proposes a two-stage graph evolution mechanism to explore relevant diagnosis paths and identify missing relations automatically. It then leverages a reasoning LLM (e.g., DeepSeek-R1) to infer root causes and generate clear diagnosis reports for both DBAs and common users. Our evaluation over four mainstream database systems (Oracle, MySQL, PostgreSQL, and DM8) demonstrates that DBAIOps outperforms state-of-the-art baselines, 34.85% and 47.22% higher in root cause and human evaluation accuracy, respectively.

anomaly, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.01136

Genre: Research Report (0.40)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

An advanced AI driven database system

Tedeschi, M., Rizwan, S., Shringi, C., Chandgir, V. Devram, Belich, S.

arXiv.org Artificial IntelligenceJul-25-2025

Contemporary database systems, while effective, suffer severe issues related to complexity and usability, especially among individuals who lack technical expertise but are unfamiliar with query languages like Structured Query Language (SQL). This paper presents a new database system supported by Artificial Intelligence (AI), which is intended to improve the management of data using natural language processing (NLP) - based intuitive interfaces, and automatic creation of structured queries and semi-structured data formats like yet another markup language (YAML), java script object notation (JSON), and application program interface (API) documentation. The system is intended to strengthen the potential of databases through the integration of Large Language Models (LLMs) and advanced machine learning algorithms. The integration is purposed to allow the automation of fundamental tasks such as data modeling, schema creation, query comprehension, and performance optimization. We present in this paper a system that aims to alleviate the main problems with current database technologies. It is meant to reduce the need for technical skills, manual tuning for better performance, and the potential for human error. The AI database employs generative schema inference and format selection to build its schema models and execution formats.

large language model, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.21125/edulearn.2025.1294

2507.17778

Country: North America > United States (0.47)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.73)

Add feedback

In-Context Adaptation to Concept Drift for Learned Database Operations

Zhu, Jiaqi, Cai, Shaofeng, Shen, Yanyan, Chen, Gang, Deng, Fang, Ooi, Beng Chin

arXiv.org Artificial IntelligenceMay-23-2025

Machine learning has demonstrated transformative potential for database operations, such as query optimization and in-database data analytics. However, dynamic database environments, characterized by frequent updates and evolving data distributions, introduce concept drift, which leads to performance degradation for learned models and limits their practical applicability. Addressing this challenge requires efficient frameworks capable of adapting to shifting concepts while minimizing the overhead of retraining or fine-tuning. In this paper, we propose FLAIR, an online adaptation framework that introduces a new paradigm called \textit{in-context adaptation} for learned database operations. FLAIR leverages the inherent property of data systems, i.e., immediate availability of execution results for predictions, to enable dynamic context construction. By formalizing adaptation as $f:(\mathbf{x} \,| \,C_t) \to \mathbf{y}$, with $C_t$ representing a dynamic context memory, FLAIR delivers predictions aligned with the current concept, eliminating the need for runtime parameter optimization. To achieve this, FLAIR integrates two key modules: a Task Featurization Module for encoding task-specific features into standardized representations, and a Dynamic Decision Engine, pre-trained via Bayesian meta-training, to adapt seamlessly using contextual information at runtime. Extensive experiments across key database tasks demonstrate that FLAIR outperforms state-of-the-art baselines, achieving up to 5.2x faster adaptation and reducing error by 22.5% for cardinality estimation.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.04404

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.48)

Add feedback

Technical Perspective: Ad Hoc Transactions: What They Are and Why We Should Care

Communications of the ACMMar-18-2025, 16:56:25 GMT

Most database research papers are prescriptive. They identify a technical problem and show us how to solve it. Other papers follow a different path; they are descriptive rather than prescriptive. They tell us how data systems behave in practice and how they are actually used. They employ a different set of tools, such as surveys, software analyses, or user studies.

application, artificial intelligence, transaction, (14 more...)

Communications of the ACM

Technology: Information Technology > Artificial Intelligence (0.36)

Add feedback

SQL4NN: Validation and expressive querying of models as data

Gerarts, Mark, Steegmans, Juno, Bussche, Jan Van den

arXiv.org Artificial IntelligenceFeb-20-2025

Any serious machine learning project will quickly produce a multitude of models learned from data. These models are then validated, tested in different ways, modified, retrained, and archived or deployed. This multitude of models also constitutes valuable data in itself. We may refer to such data as intensional, in contrast to what we already know as extensional data: training data, background data, data for validation, etc. Our point of view in this paper is that models are data too and should be managed using database technology, just like the extensional data. In particular, we should be able to query models. Indeed, many tasks that we usually consider as validation, analysis, explanation, verification, pruning, etc., of models [Rud19, Alb21, LAL

neural network, query, sql, (14 more...)

arXiv.org Artificial Intelligence

2502.14745

Country: Europe > Belgium (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Security Threats in Agentic AI System

Khan, Raihan, Sarkar, Sayak, Mahata, Sainik Kumar, Jose, Edwin

arXiv.org Artificial IntelligenceOct-16-2024

Artificial Intelligence (AI) agents have become increasingly prevalent in various applications, from virtual assistants to complex data analysis systems. However, their direct access to databases raises significant concerns regarding privacy and security. This paper examines these critical issues, focusing on the potential risks posed by unrestricted AI access to sensitive data. The rapid advancement of AI technologies has resulted in systems capable of processing vast amounts of data and generating human-like responses. While this progress has provided numerous benefits, it has also introduced new challenges in ensuring data privacy and security. AI agents with direct access to databases may inadvertently expose confidential information, or they may be exploited by malicious actors to access or manipulate sensitive data. Additionally, AI systems' ability to analyze large datasets increases the risk of unintended privacy violations, making them prime targets for attacks aimed at extracting or misusing data. This paper explores the current landscape of AI agent interactions with databases and analyzes the associated risks. It discusses the potential threats to privacy protection and data security as AI agents become more integrated into various applications.

ai agent, artificial intelligence, database, (16 more...)

arXiv.org Artificial Intelligence

2410.14728

Country:

North America > United States > Michigan (0.05)
North America > United States > California (0.04)
Europe > Netherlands > Drenthe > Assen (0.04)
Asia > India > West Bengal > Kolkata (0.04)

Genre:

Research Report (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

QirK: Question Answering via Intermediate Representation on Knowledge Graphs

Scheerer, Jan Luca, Lykov, Anton, Kayali, Moe, Fountalis, Ilias, Olteanu, Dan, Vasiloglou, Nikolaos, Suciu, Dan

arXiv.org Artificial IntelligenceAug-14-2024

We demonstrate QirK, a system for answering natural language questions on Knowledge Graphs (KG). QirK can answer structurally complex questions that are still beyond the reach of emerging Large Language Models (LLMs). It does so using a unique combination of database technology, LLMs, and semantic search over vector embeddings. The glue for these components is an intermediate representation (IR). The input question is mapped to IR using LLMs, which is then repaired into a valid relational database query with the aid of a semantic search on vector embeddings. This allows a practical synthesis of LLM capabilities and KG reliability. A short video demonstrating QirK is available at https://youtu.be/6c81BLmOZ0U.

intermediate representation, qirk, query, (11 more...)

arXiv.org Artificial Intelligence

2408.07494

Country:

North America > United States (0.48)
Europe > Switzerland > Zürich > Zürich (0.05)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment (0.96)
Media > Film (0.71)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback