bottom cluster
Reviews: Hierarchical Clustering Beyond the Worst-Case
This paper studies the problem of hierarchical clustering in a beyond-the-worst-case setting, where the data is a random graph generated by a Hierarchical Stochastic Block Model (HSBM). It proposes an SVD linkage algorithm and proves that in certain regimes it returns a solution that is a constant approximation to the cost function of hierarchical clustering proposed by Dasgupta. It also considers an algorithm that combines the SDP relaxation approach and recursive sparsest cut approach in previous works to get a constant approximation in larger regimes. Roughly speaking, the HSBM model considered assumes some underlying tree T* over k leaves (each leaf corresponding to a bottom cluster containing a certain number of vertices). Each node in the tree has some weight in (0,1) and a node's weight should not be larger than its descendant's.
WISK: A Workload-aware Learned Index for Spatial Keyword Queries
Sheng, Yufan, Cao, Xin, Fang, Yixiang, Zhao, Kaiqi, Qi, Jianzhong, Cong, Gao, Zhang, Wenjie
Spatial objects often come with textual information, such as Points of Interest (POIs) with their descriptions, which are referred to as geo-textual data. To retrieve such data, spatial keyword queries that take into account both spatial proximity and textual relevance have been extensively studied. Existing indexes designed for spatial keyword queries are mostly built based on the geo-textual data without considering the distribution of queries already received. However, previous studies have shown that utilizing the known query distribution can improve the index structure for future query processing. In this paper, we propose WISK, a learned index for spatial keyword queries, which self-adapts for optimizing querying costs given a query workload. One key challenge is how to utilize both structured spatial attributes and unstructured textual information during learning the index. We first divide the data objects into partitions, aiming to minimize the processing costs of the given query workload. We prove the NP-hardness of the partitioning problem and propose a machine learning model to find the optimal partitions. Then, to achieve more pruning power, we build a hierarchical structure based on the generated partitions in a bottom-up manner with a reinforcement learning-based approach. We conduct extensive experiments on real-world datasets and query workloads with various distributions, and the results show that WISK outperforms all competitors, achieving up to 8x speedup in querying time with comparable storage overhead.
- Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- (5 more...)