Goto

Collaborating Authors

 construction time





A Discussion of the generative model 1

Neural Information Processing Systems

Thus, we verify that the random effects estimator is equivalent to the generative model (1). Specifically, if u(x) = 1 for all x X, we use ( X, P,ψ) for simplicity. Due to the separability of ψ, we have the following coreset definition. Definitions 2.2 and 2.3, the regression objectives of OLSE and GLSE can be decomposed into Thus, we can apply the above definition to define coresets for OLSE and GLSE. Now we are ready to describe the FL framework in the language of a query space. We first prove Theorem C.1 and propose the corresponding algorithm that constructs an Next, we prove Theorem C.2 and propose the corresponding algorithm that constructs an accurate Caratheodory's Theorem, there must exist at most To accelerate the running time, Jubran et al. [ By the Caratheodory's Theorem, there must exist at most In this section, we complete the proofs for GLSE.





Piecewise Linear Approximation in Learned Index Structures: Theoretical and Empirical Analysis

Qin, Jiayong, Zhu, Xianyu, Liu, Qiyu, Zhang, Guangyi, Cai, Zhigang, Liao, Jianwei, Hu, Sha, Peng, Jingshu, Shao, Yingxia, Chen, Lei

arXiv.org Artificial Intelligence

A growing trend in the database and system communities is to augment conventional index structures, such as B+-trees, with machine learning (ML) models. Among these, error-bounded Piecewise Linear Approximation ($ε$-PLA) has emerged as a popular choice due to its simplicity and effectiveness. Despite its central role in many learned indexes, the design and analysis of $ε$-PLA fitting algorithms remain underexplored. In this paper, we revisit $ε$-PLA from both theoretical and empirical perspectives, with a focus on its application in learned index structures. We first establish a fundamentally improved lower bound of $Ω(κ\cdot ε^2)$ on the expected segment coverage for existing $ε$-PLA fitting algorithms, where $κ$ is a data-dependent constant. We then present a comprehensive benchmark of state-of-the-art $ε$-PLA algorithms when used in different learned data structures. Our results highlight key trade-offs among model accuracy, model size, and query performance, providing actionable guidelines for the principled design of future learned data structures.


Dual-Branch HNSW Approach with Skip Bridges and LID-Driven Optimization

Nguyen, Hy, Nguyen, Nguyen Hung, Nguyen, Nguyen Linh Bao, Thudumu, Srikanth, Du, Hung, Vasa, Rajesh, Mouzakis, Kon

arXiv.org Artificial Intelligence

The Hierarchical Navigable Small World (HNSW) algorithm is widely used for approximate nearest neighbor (ANN) search, leveraging the principles of navigable small-world graphs. However, it faces some limitations. The first is the local optima problem, which arises from the algorithm's greedy search strategy, selecting neighbors based solely on proximity at each step. This often leads to cluster disconnections. The second limitation is that HNSW frequently fails to achieve logarithmic complexity, particularly in high-dimensional datasets, due to the exhaustive traversal through each layer. To address these limitations, we propose a novel algorithm that mitigates local optima and cluster disconnections while enhancing the construction speed, maintaining inference speed. The first component is a dual-branch HNSW structure with LID-based insertion mechanisms, enabling traversal from multiple directions. This improves outlier node capture, enhances cluster connectivity, accelerates construction speed and reduces the risk of local minima. The second component incorporates a bridge-building technique that bypasses redundant intermediate layers, maintaining inference and making up the additional computational overhead introduced by the dual-branch structure. Experiments on various benchmarks and datasets showed that our algorithm outperforms the original HNSW in both accuracy and speed. We evaluated six datasets across Computer Vision (CV), and Natural Language Processing (NLP), showing recall improvements of 18\% in NLP, and up to 30\% in CV tasks while reducing the construction time by up to 20\% and maintaining the inference speed. We did not observe any trade-offs in our algorithm. Ablation studies revealed that LID-based insertion had the greatest impact on performance, followed by the dual-branch structure and bridge-building components.


Learning Lossless Compression for High Bit-Depth Volumetric Medical Image

Wang, Kai, Bai, Yuanchao, Li, Daxin, Zhai, Deming, Jiang, Junjun, Liu, Xianming

arXiv.org Artificial Intelligence

Recent advances in learning-based methods have markedly enhanced the capabilities of image compression. However, these methods struggle with high bit-depth volumetric medical images, facing issues such as degraded performance, increased memory demand, and reduced processing speed. To address these challenges, this paper presents the Bit-Division based Lossless Volumetric Image Compression (BD-LVIC) framework, which is tailored for high bit-depth medical volume compression. The BD-LVIC framework skillfully divides the high bit-depth volume into two lower bit-depth segments: the Most Significant Bit-Volume (MSBV) and the Least Significant Bit-Volume (LSBV). The MSBV concentrates on the most significant bits of the volumetric medical image, capturing vital structural details in a compact manner. This reduction in complexity greatly improves compression efficiency using traditional codecs. Conversely, the LSBV deals with the least significant bits, which encapsulate intricate texture details. To compress this detailed information effectively, we introduce an effective learning-based compression model equipped with a Transformer-Based Feature Alignment Module, which exploits both intra-slice and inter-slice redundancies to accurately align features. Subsequently, a Parallel Autoregressive Coding Module merges these features to precisely estimate the probability distribution of the least significant bit-planes. Our extensive testing demonstrates that the BD-LVIC framework not only sets new performance benchmarks across various datasets but also maintains a competitive coding speed, highlighting its significant potential and practical utility in the realm of volumetric medical image compression.