AITopics | data system

Collaborating Authors

data system

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Supporting Dynamic Agentic Workloads: How Data and Agents Interact

Giurgiu, Ioana, Nidd, Michael E.

arXiv.org Artificial IntelligenceDec-11-2025

The rise of multi-agent systems powered by large language models (LLMs) and specialized reasoning agents exposes fundamental limitations in today's data management architectures. Traditional databases and data fabrics were designed for static, well-defined workloads, whereas agentic systems exhibit dynamic, context-driven, and collaborative behaviors. Agents continuously decompose tasks, shift attention across modalities, and share intermediate results with peers - producing non-deterministic, multi-modal workloads that strain conventional query optimizers and caching mechanisms. We propose an Agent-Centric Data Fabric, a unified architecture that rethinks how data systems serve, optimize, coordinate, and learn from agentic workloads. To achieve this we exploit the concepts of attention-guided data retrieval, semantic micro-caching for context-driven agent federations, predictive data prefetching and quorum-based data serving. Together, these mechanisms enable agents to access representative data faster and more efficiently, while reducing redundant queries, data movement, and inference load across systems. By framing data systems as adaptive collaborators, instead of static executors, we outline new research directions toward behaviorally responsive data infrastructures, where caching, probing, and orchestration jointly enable efficient, context-rich data exchange among dynamic, reasoning-driven agents.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2512.09548

Country:

Asia (0.93)
North America > United States > California (0.28)
North America > United States > Texas (0.28)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Can AI autonomously build, operate, and use the entire data stack?

Agarwal, Arvind, Amini, Lisa, Mehta, Sameep, Samulowitz, Horst, Srinivas, Kavitha

arXiv.org Artificial IntelligenceDec-10-2025

Enterprise data management is a monumental task. It spans data architecture and systems, integration, quality, governance, and continuous improvement. While AI assistants can help specific persona, such as data engineers and stewards, to navigate and configure the data stack, they fall far short of full automation. However, as AI becomes increasingly capable of tackling tasks that have previously resisted automation due to inherent complexities, we believe there is an imminent opportunity to target fully autonomous data estates. Currently, AI is used in different parts of the data stack, but in this paper, we argue for a paradigm shift from the use of AI in independent data component operations towards a more holistic and autonomous handling of the entire data lifecycle. Towards that end, we explore how each stage of the modern data stack can be autonomously managed by intelligent agents to build self-sufficient systems that can be used not only by human end-users, but also by AI itself. We begin by describing the mounting forces and opportunities that demand this paradigm shift, examine how agents can streamline the data lifecycle, and highlight open questions and areas where additional research is needed. We hope this work will inspire lively debate, stimulate further research, motivate collaborative approaches, and facilitate a more autonomous future for data systems.

agent, artificial intelligence, natural language, (19 more...)

arXiv.org Artificial Intelligence

2512.07926

Country:

North America > United States (0.93)
Europe (0.68)

Genre: Research Report (0.64)

Industry:

Law (1.00)
Information Technology > Security & Privacy (1.00)
Banking & Finance > Trading (0.93)
Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Supporting Our AI Overlords: Redesigning Data Systems to be Agent-First

Liu, Shu, Ponnapalli, Soujanya, Shankar, Shreya, Zeighami, Sepanta, Zhu, Alan, Agarwal, Shubham, Chen, Ruiqi, Suwito, Samion, Yuan, Shuo, Stoica, Ion, Zaharia, Matei, Cheung, Alvin, Crooks, Natacha, Gonzalez, Joseph E., Parameswaran, Aditya G.

arXiv.org Artificial IntelligenceDec-9-2025

Large Language Model (LLM) agents, acting on their users' behalf to manipulate and analyze data, are likely to become the dominant workload for data systems in the future. When working with data, agents employ a high-throughput process of exploration and solution formulation for the given task, one we call agentic speculation. The sheer volume and inefficiencies of agentic speculation can pose challenges for present-day data systems. We argue that data systems need to adapt to more natively support agentic workloads. We take advantage of the characteristics of agentic speculation that we identify, i.e., scale, heterogeneity, redundancy, and steerability - to outline a number of new research opportunities for a new agent-first data systems architecture, ranging from new query interfaces, to new query processing techniques, to new agentic memory stores.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.00997

Country: North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry: Information Technology (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
(2 more...)

Add feedback

ARCADE: A Real-Time Data System for Hybrid and Continuous Query Processing across Diverse Data Modalities

Yang, Jingyi, Mo, Songsong, Shi, Jiachen, Yu, Zihao, Shi, Kunhao, Ding, Xuchen, Cong, Gao

arXiv.org Artificial IntelligenceSep-25-2025

The explosive growth of multimodal data - spanning text, image, video, spatial, and relational modalities, coupled with the need for real-time semantic search and retrieval over these data - has outpaced the capabilities of existing multimodal and real-time database systems, which either lack efficient ingestion and continuous query capability, or fall short in supporting expressive hybrid analytics. We introduce ARCADE, a real-time data system that efficiently supports high-throughput ingestion and expressive hybrid and continuous query processing across diverse data types. ARCADE introduces unified disk-based secondary index on LSM-based storage for vector, spatial, and text data modalities, a comprehensive cost-based query optimizer for hybrid queries, and an incremental materialized view framework for efficient continuous queries. Built on open-source RocksDB storage and MySQL query engine, ARCADE outperforms leading multimodal data systems by up to 7.4x on read-heavy and 1.4x on write-heavy workloads.

artificial intelligence, natural language, real time system, (15 more...)

arXiv.org Artificial Intelligence

2509.19757

Genre: Research Report (0.64)

Industry: Information Technology > Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Architecture > Real Time Systems (1.00)

Add feedback

Toward Data Systems That Are Business Semantic Centric and AI Agents Assisted

Pang, Cecil

arXiv.org Artificial IntelligenceJul-14-2025

Contemporary businesses operate in dynamic environments requiring rapid adaptation to achieve goals and maintain competitiveness. Existing data platforms often fall short by emphasizing tools over alignment with business needs, resulting in inefficiencies and delays. To address this gap, I propose the Business Semantics Centric, AI Agents Assisted Data System (BSDS), a holistic system that integrates architecture, workflows, and team organization to ensure data systems are tailored to business priorities rather than dictated by technical constraints. BSDS redefines data systems as dynamic enablers of business success, transforming them from passive tools into active drivers of organizational growth. BSDS has a modular architecture that comprises curated data linked to business entities, a knowledge base for context-aware AI agents, and efficient data pipelines. AI agents play a pivotal role in assisting with data access and system management, reducing human effort, and improving scalability. Complementing this architecture, BSDS incorporates workflows optimized for both exploratory data analysis and production requirements, balancing speed of delivery with quality assurance. A key innovation of BSDS is its incorporation of the human factor. By aligning data team expertise with business semantics, BSDS bridges the gap between technical capabilities and business needs. Validated through real-world implementation, BSDS accelerates time-to-market for data-driven initiatives, enhances cross-functional collaboration, and provides a scalable blueprint for businesses of all sizes. Future research can build on BSDS to explore optimization strategies using complex systems and adaptive network theories, as well as developing autonomous data systems leveraging AI agents.

agent, artificial intelligence, information management, (18 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2025.3583260

2506.0552

Country: North America > United States > New York (0.28)

Genre:

Workflow (1.00)
Research Report (0.82)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)

Add feedback

Palantir accuses UK doctors of choosing 'ideology over patient interest' in NHS data row

The GuardianJul-8-2025, 14:15:32 GMT

Palantir, a US data company that works with Israel's defence ministry, has accused British doctors of choosing "ideology over patient interest" after they attacked the firm's contract to process NHS data. Louis Mosley, Palantir's executive vice-president, hit back at the British Medical Association, which recently said the 330m deal to create a single platform for NHS data – ranging from patient data to bed availability – "threatens to undermine public trust in NHS data systems". In a formal resolution the doctors said last month this was because it was unclear how the sensitive data would be processed by Palantir, which was founded by the Trump donor Peter Thiel. They cited the firm's "track record of creating discriminatory policing software in the US" and its "close links to a US government which shows little regard for international law". But Mosley dismissed the attack when he gave evidence to MPs from the Commons science and technology committee on Tuesday. Palantir has also won contracts to handle mass data controlled by the Ministry of Defence, police and local authorities.

mosley, palantir, patient interest, (12 more...)

The Guardian

Country:

Europe > United Kingdom (1.00)
Asia > Middle East > Israel (0.37)
North America > United States > District of Columbia > Washington (0.05)
(2 more...)

Industry:

Health & Medicine > Health Care Providers & Services (1.00)
Health & Medicine > Government Relations & Public Policy (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (1.00)

Technology: Information Technology > Artificial Intelligence (0.74)

Add feedback

StreamLink: Large-Language-Model Driven Distributed Data Engineering System

Feng, Dawei, Mei, Di, Tan, Huiri, Ren, Lei, Lou, Xianying, Tan, Zhangxi

arXiv.org Artificial IntelligenceMay-29-2025

Large Language Models (LLMs) have shown remarkable proficiency in natural language understanding (NLU), opening doors for innovative applications. We introduce StreamLink - an LLM-driven distributed data system designed to improve the efficiency and accessibility of data engineering tasks. We build StreamLink on top of distributed frameworks such as Apache Spark and Hadoop to handle large data at scale. One of the important design philosophies of StreamLink is to respect user data privacy by utilizing local fine-tuned LLMs instead of a public AI service like ChatGPT. With help from domain-adapted LLMs, we can improve our system's understanding of natural language queries from users in various scenarios and simplify the procedure of generating database queries like the Structured Query Language (SQL) for information processing. We also incorporate LLM-based syntax and security checkers to guarantee the reliability and safety of each generated query. StreamLink illustrates the potential of merging generative LLMs with distributed data processing for comprehensive and user-centric data engineering. With this architecture, we allow users to interact with complex database systems at different scales in a user-friendly and security-ensured manner, where the SQL generation reaches over 10\% of execution accuracy compared to baseline methods, and allow users to find the most concerned item from hundreds of millions of items within a few seconds using natural language.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2505.21575

Country:

Asia > China (0.30)
North America > United States (0.29)

Genre:

Research Report (1.00)
Overview > Innovation (0.34)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

LLM-Powered Proactive Data Systems

Zeighami, Sepanta, Lin, Yiming, Shankar, Shreya, Parameswaran, Aditya

arXiv.org Artificial IntelligenceFeb-18-2025

With the power of LLMs, we now have the ability to query data that was previously impossible to query, including text, images, and video. However, despite this enormous potential, most present-day data systems that leverage LLMs are reactive, reflecting our community's desire to map LLMs to known abstractions. Most data systems treat LLMs as an opaque black box that operates on user inputs and data as is, optimizing them much like any other approximate, expensive UDFs, in conjunction with other relational operators. Such data systems do as they are told, but fail to understand and leverage what the LLM is being asked to do (i.e. the underlying operations, which may be error-prone), the data the LLM is operating on (e.g., long, complex documents), or what the user really needs. They don't take advantage of the characteristics of the operations and/or the data at hand, or ensure correctness of results when there are imprecisions and ambiguities. We argue that data systems instead need to be proactive: they need to be given more agency -- armed with the power of LLMs -- to understand and rework the user inputs and the data and to make decisions on how the operations and the data should be represented and processed. By allowing the data system to parse, rewrite, and decompose user inputs and data, or to interact with the user in ways that go beyond the standard single-shot query-result paradigm, the data system is able to address user needs more efficiently and effectively. These new capabilities lead to a rich design space where the data system takes more initiative: they are empowered to perform optimization based on the transformation operations, data characteristics, and user intent. We discuss various successful examples of how this framework has been and can be applied in real-world tasks, and present future directions for this ambitious research agenda.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.13016

Genre: Research Report (0.40)

Industry:

Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.70)
Law (0.69)
Health & Medicine (0.48)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.34)

Add feedback

CREDAL: Close Reading of Data Models

Fletcher, George, Nahurna, Olha, Prytula, Matvii, Stoyanovich, Julia

arXiv.org Artificial IntelligenceFeb-11-2025

Data models are necessary for the birth of data and of any data-driven system. Indeed, every algorithm, every machine learning model, every statistical model, and every database has an underlying data model without which the system would not be usable. Hence, data models are excellent sites for interrogating the (material, social, political, ...) conditions giving rise to a data system. Towards this, drawing inspiration from literary criticism, we propose to closely read data models in the same spirit as we closely read literary artifacts. Close readings of data models reconnect us with, among other things, the materiality, the genealogies, the techne, the closed nature, and the design of technical systems. While recognizing from literary theory that there is no one correct way to read, it is nonetheless critical to have systematic guidance for those unfamiliar with close readings. This is especially true for those trained in the computing and data sciences, who too often are enculturated to set aside the socio-political aspects of data work. A systematic methodology for reading data models currently does not exist. To fill this gap, we present the CREDAL methodology for close readings of data models. We detail our iterative development process and present results of a qualitative evaluation of CREDAL demonstrating its usability, usefulness, and effectiveness in the critical study of data.

artificial intelligence, data model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.07943

Country:

North America > United States > New York (0.04)
Europe > Ukraine > Lviv Oblast > Lviv (0.04)
North America > United States > Wisconsin > Dane County > Madison (0.04)
(5 more...)

Genre:

Research Report (1.00)
Personal > Interview (0.93)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)

Add feedback

NeurDB: An AI-powered Autonomous Data System

Ooi, Beng Chin, Cai, Shaofeng, Chen, Gang, Shen, Yanyan, Tan, Kian-Lee, Wu, Yuncheng, Xiao, Xiaokui, Xing, Naili, Yue, Cong, Zeng, Lingze, Zhang, Meihui, Zhao, Zhanhao

arXiv.org Artificial IntelligenceJul-4-2024

In the wake of rapid advancements in artificial intelligence (AI), we stand on the brink of a transformative leap in data systems. The imminent fusion of AI and DB (AIxDB) promises a new generation of data systems, which will relieve the burden on end-users across all industry sectors by featuring AI-enhanced functionalities, such as personalized and automated in-database AI-powered analytics, self-driving capabilities for improved system performance, etc. In this paper, we explore the evolution of data systems with a focus on deepening the fusion of AI and DB. We present NeurDB, an AI-powered autonomous data system designed to fully embrace AI design in each major system component and provide in-database AI-powered analytics. We outline the conceptual and architectural overview of NeurDB, discuss its design choices and key components, and report its current development and future plan.

beng chin ooi, data system, neurdb, (14 more...)

arXiv.org Artificial Intelligence

2405.03924

Country:

Asia > Singapore > Central Region > Singapore (0.04)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China > Shanghai > Shanghai (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
(4 more...)

Add feedback