Learned Accelerator Framework for Angular-Distance-Based High-Dimensional DBSCAN
–arXiv.org Artificial Intelligence
Density-based clustering is a commonly used tool in data science. Today many data science works are utilizing high-dimensional neural embeddings. However, traditional density-based clustering techniques like DBSCAN have a degraded performance on high-dimensional data. In this paper, we propose LAF, a generic learned accelerator framework to speed up the original DBSCAN and the sampling-based variants of DBSCAN on high-dimensional data with angular distance metric. This framework consists of a learned cardinality estimator and a post-processing module. The cardinality estimator can fast predict whether a data point is core or not to skip unnecessary range queries, while the post-processing module detects the false negative predictions and merges the falsely separated clusters. The evaluation shows our LAF-enhanced DBSCAN method outperforms the state-of-the-art efficient DBSCAN variants on both efficiency and quality.
arXiv.org Artificial Intelligence
Feb-6-2023
- Country:
- Asia > Singapore (0.04)
- Oceania > Australia
- North America > United States
- District of Columbia > Washington (0.05)
- New York > New York County
- New York City (0.04)
- Florida
- Alachua County > Gainesville (0.14)
- Hillsborough County > University (0.04)
- Genre:
- Research Report (0.40)
- Technology: