A Lightweight Learned Cardinality Estimation Model

Zhu, Yaoyu, Zhang, Jintao, Li, Guoliang, Feng, Jianhua

Aug-14-2025–arXiv.org Artificial Intelligence

--Cardinality estimation is a fundamental task in database management systems, aiming to predict query results accurately without executing the queries. However, existing techniques either achieve low estimation accuracy or take high inference latency. Simultaneously achieving high speed and accuracy becomes critical for the cardinality estimation problem. In this paper, we propose a novel data-driven approach called CoDe (Covering with Decompositions) to address this problem. CoDe employs the concept of covering design, which divides the table into multiple smaller, overlapping segments. For each segment, CoDe utilizes tensor decomposition to accurately model its data distribution. Moreover, CoDe introduces innovative algorithms to select the best-fitting distributions for each query, combining them to estimate the final result. Notably, experimental results show that our method represents a significant advancement in cardinality estimation, achieving state-of-the-art levels of both estimation accuracy and inference efficiency. Across various datasets, CoDe achieves absolute accuracy in estimating more than half of the queries. Cardinality estimation poses a critical challenge in database management systems (DBMS) as it aims to predict query results accurately without executing the queries. This task is crucial for query optimization, as it allows the optimizer to devise the most efficient query plans. Despite numerous proposed solutions, cardinality estimation remains an unsolved problem. Two primary approaches have been explored to tackle this issue: workload-driven methods [17], [32] and data-driven methods [27], [47], [49]. Motivation. Figure 1 illustrates the comparison between our work and the limitations of existing methods. Workload-driven methods focus on learning patterns from historical workloads and their corresponding results. While these methods are generally fast, their accuracy can degrade when workloads change or are randomly generated. This limitation stems from their lack of direct access to the underlying data and their heavy reliance on the distribution of past workloads. As a result, they are positioned in the bottom-right corner of the graph. On the other hand, recent advancements in data-driven methods directly learn the data distribution, significantly improving estimation accuracy. The authors are with the Department of Computer Science and Technology, Tsinghua University, Beijing, China. Data-driven methods are often orders of magnitude slower than workload-driven methods, placing them in the top-left corner of the graph. Achieving both high speed and accuracy simultaneously is a critical challenge in cardinality estimation, which our work aims to address. Recent research, such as UAE [45], has explored hybrid approaches that combine data and workload information, using workload patterns to enhance data learning.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Aug-14-2025

arXiv.org PDF

Add feedback

Country:
- Asia
  - Middle East > UAE (0.25)
  - China > Beijing
    - Beijing (0.24)

Genre:
- Research Report > New Finding (0.34)

Technology:
- Information Technology
  - Databases (1.00)
  - Artificial Intelligence
    - Natural Language > Information Retrieval
      - Query Processing (1.00)
    - Machine Learning > Learning Graphical Models
      - Directed Networks > Bayesian Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found