AITopics | cardinality estimator

Collaborating Authors

cardinality estimator

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

623307df18da128262aaf394cdcfb235-Paper-Conference.pdf

Neural Information Processing SystemsAug-15-2025, 06:49:18 GMT

algorithm, cardinality, sketch, (15 more...)

Neural Information Processing Systems

Country: North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Sketched Sum-Product Networks for Joins

Tsan, Brian, Amanbayev, Abylay, Datta, Asoke, Rusu, Florin

arXiv.org Artificial IntelligenceJun-18-2025

Sketches have shown high accuracy in multi-way join cardinality estimation, a critical problem in cost-based query optimization. Accurately estimating the cardinality of a join operation -- analogous to its computational cost -- allows the optimization of query execution costs in relational database systems. However, although sketches have shown high efficacy in query optimization, they are typically constructed specifically for predefined selections in queries that are assumed to be given a priori, hindering their applicability to new queries. As a more general solution, we propose for Sum-Product Networks to dynamically approximate sketches on-the-fly. Sum-Product Networks can decompose and model multivariate distributions, such as relations, as linear combinations of multiple univariate distributions. By representing these univariate distributions as sketches, Sum-Product Networks can combine them element-wise to efficiently approximate the sketch of any query selection. These approximate sketches can then be applied to join cardinality estimation. In particular, we implement the Fast-AGMS and Bound Sketch methods, which have successfully been used in prior work, despite their costly construction. By accurately approximating them instead, our work provides a practical alternative to apply these sketches to query optimization.

machine learning, natural language, sketch, (20 more...)

arXiv.org Artificial Intelligence

2506.14034

Country: North America > United States > California (0.46)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Optimize Cardinality Estimation Model Pretraining by Simplifying the Training Datasets

Fang, Boyang

arXiv.org Artificial IntelligenceFeb-20-2025

The cardinality estimation is a key aspect of query optimization research, and its performance has significantly improved with the integration of machine learning. To overcome the "cold start" problem or the lack of model transferability in learned cardinality estimators, some pre-training cardinality estimation models have been proposed that use learning across multiple datasets and corresponding workloads. These models typically train on a dataset created by uniformly sampling from many datasets, but this approach may not be optimal. By applying the Group Distributionally Robust Optimization (Group DRO) algorithm to training datasets, we find that some specific training datasets contribute more significantly to model performance than others. Based on this observation, we conduct extensive experiments to delve deeper into pre-training cardinality estimators. Our results show how the performance of these models can be influenced by the datasets and corresponding workloads. Finally, we introduce a simplified training dataset, which has been reduced to a fraction of the size of existing pretraining datasets. Sufficient experimental results demonstrate that the pre-trained cardinality estimator based on this simplified dataset can still achieve comparable performance to existing models in zero-shot setups.

cardinality estimator, dataset, pre-trained cardinality estimator, (11 more...)

arXiv.org Artificial Intelligence

2502.1435

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.67)

Add feedback

Xling: A Learned Filter Framework for Accelerating High-Dimensional Approximate Similarity Join

Wang, Yifan, Pathak, Vyom, Wang, Daisy Zhe

arXiv.org Artificial IntelligenceFeb-20-2024

Similarity join finds all pairs of close points within a given distance threshold. Many similarity join methods have been proposed, but they are usually not efficient on high-dimensional space due to the curse of dimensionality and data-unawareness. We investigate the possibility of using metric space Bloom filter (MSBF), a family of data structures checking if a query point has neighbors in a multi-dimensional space, to speed up similarity join. However, there are several challenges when applying MSBF to similarity join, including excessive information loss, data-unawareness and hard constraint on the distance metric. In this paper, we propose Xling, a generic framework to build a learning-based metric space filter with any existing regression model, aiming at accurately predicting whether a query point has enough number of neighbors. The framework provides a suite of optimization strategies to further improve the prediction quality based on the learning model, which has demonstrated significantly higher prediction quality than existing MSBF. We also propose XJoin, one of the first filter-based similarity join methods, based on Xling. By predicting and skipping those queries without enough neighbors, XJoin can effectively reduce unnecessary neighbor searching and therefore it achieves a remarkable acceleration. Benefiting from the generalization capability of deep learning models, XJoin can be easily transferred onto new dataset (in similar distribution) without re-training. Furthermore, Xling is not limited to being applied in XJoin, instead, it acts as a flexible plugin that can be inserted to any loop-based similarity join methods for a speedup.

arXiv.org Artificial Intelligence

2402.13397

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Hillsborough County > University (0.04)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)

Add feedback

Learned Accelerator Framework for Angular-Distance-Based High-Dimensional DBSCAN

Wang, Yifan, Wang, Daisy Zhe

arXiv.org Artificial IntelligenceFeb-6-2023

Density-based clustering is a commonly used tool in data science. Today many data science works are utilizing high-dimensional neural embeddings. However, traditional density-based clustering techniques like DBSCAN have a degraded performance on high-dimensional data. In this paper, we propose LAF, a generic learned accelerator framework to speed up the original DBSCAN and the sampling-based variants of DBSCAN on high-dimensional data with angular distance metric. This framework consists of a learned cardinality estimator and a post-processing module. The cardinality estimator can fast predict whether a data point is core or not to skip unnecessary range queries, while the post-processing module detects the false negative predictions and merges the falsely separated clusters. The evaluation shows our LAF-enhanced DBSCAN method outperforms the state-of-the-art efficient DBSCAN variants on both efficiency and quality.

artificial intelligence, dbscan, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2302.03136

Country:

North America > United States > Florida > Alachua County > Gainesville (0.14)
North America > United States > District of Columbia > Washington (0.05)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback