AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Fully Explainable Classification Models Using Hyperblocks

Snyder, Austin, Gallagher, Ryan, Kovalerchuk, Boris

arXiv.org Artificial IntelligenceJun-10-2025

Building on existing work with Hyperblocks, which classify data using minimum and maximum bounds for each attribute, we focus on enhancing interpretability, decreasing training time, and reducing model complexity without sacrificing accuracy. This system allows subject matter experts (SMEs) to directly inspect and understand the model's decision logic without requiring extensive machine learning expertise. To reduce Hyperblock complexity while retaining performance, we introduce a suite of algorithms for Hyperblock simplification. These include removing redundant attributes, removing redundant blocks through overlap analysis, and creating disjunctive units. These methods eliminate unnecessary parameters, dramatically reducing model size without harming classification power. We increase robustness by introducing an interpretable fallback mechanism using k-Nearest Neighbor (k-NN) classifiers for points not covered by any block, ensuring complete data coverage while preserving model transparency. Our results demonstrate that interpretable models can scale to high-dimensional, large-volume datasets while maintaining competitive accuracy. On benchmark datasets such as WBC (9-D), we achieve strong predictive performance with significantly reduced complexity. On MNIST (784-D), our method continues to improve through tuning and simplification, showing promise as a transparent alternative to black-box models in domains where trust, clarity, and control are crucial.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2506.06986

Country: North America > United States (0.29)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

RADAR: Recall Augmentation through Deferred Asynchronous Retrieval

Jaspal, Amit, Dang, Qian, Ramineni, Ajantha

arXiv.org Artificial IntelligenceJun-10-2025

M odern large - scale recommender systems employ multi - stage ranking funnel (Retrieval, Pre - ranking, Ranking) to balance engagement and computational constraints (latency, CPU). However, the initial retrieval stage, often relying on efficient but less precise methods like K - Nearest Neighbors (KNN), struggles to effectively surface the most engaging items from billion - scale catalogs, particularly distinguishing highly relevant and engaging candidates from merely relevant ones. We introduce Recall Augmentation through Deferred Asynchronous Retrieval ( RADAR), a novel framework that leverages asynchronous, offline computation to pre - rank a significantly larger candidate set for users using the full complexity ranking model. These top - ranked items are stored and utilized as a high - quality retrieval source during online inference, bypassing online retrieval and pre - ranking stages for these candidates. We demonstrate through offline experiments that RADAR significantly boosts recall ( 2 X Recall @200 vs DNN retrieval baseline) by effectively combining a larger retrieved candidate set with a more powerful ranking model. Online A/B tests confirm a +0.8% lift in topline engagement metrics, validating RADAR as a practical and effective method to improve recommendation quality under strict online serving constraints.

artificial intelligence, machine learning, radar, (16 more...)

arXiv.org Artificial Intelligence

2506.07261

Country: North America > United States > California (0.15)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.51)

Add feedback

N$^2$: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion

Chin, Caleb, Khubchandani, Aashish, Maskara, Harshvardhan, Choi, Kyuseong, Feitelberg, Jacob, Gong, Albert, Paul, Manit, Sadhukhan, Tathagata, Agarwal, Anish, Dwivedi, Raaz

arXiv.org Machine LearningJun-5-2025

Nearest neighbor (NN) methods have re-emerged as competitive tools for matrix completion, offering strong empirical performance and recent theoretical guarantees, including entry-wise error bounds, confidence intervals, and minimax optimality. Despite their simplicity, recent work has shown that NN approaches are robust to a range of missingness patterns and effective across diverse applications. This paper introduces N$^2$, a unified Python package and testbed that consolidates a broad class of NN-based methods through a modular, extensible interface. Built for both researchers and practitioners, N$^2$ supports rapid experimentation and benchmarking. Using this framework, we introduce a new NN variant that achieves state-of-the-art results in several settings. We also release a benchmark suite of real-world datasets, from healthcare and recommender systems to causal inference and LLM evaluation, designed to stress-test matrix completion methods beyond synthetic scenarios. Our experiments demonstrate that while classical methods excel on idealized data, NN-based techniques consistently outperform them in real-world settings.

large language model, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2506.04166

Country:

North America > United States > California (0.05)
North America > United States > Utah (0.04)
North America > United States > Pennsylvania (0.04)
(5 more...)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Public Health (0.46)
Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient Quantum Approximate $k$NN Algorithm via Granular-Ball Computing

Xia, Shuyin, Tian, Xiaojiang, Yuan, Suzhen, Deng, Jeremiah D.

arXiv.org Artificial IntelligenceMay-30-2025

High time complexity is one of the biggest challenges faced by $k$-Nearest Neighbors ($k$NN). Although current classical and quantum $k$NN algorithms have made some improvements, they still have a speed bottleneck when facing large amounts of data. To address this issue, we propose an innovative algorithm called Granular-Ball based Quantum $k$NN(GB-Q$k$NN). This approach achieves higher efficiency by first employing granular-balls, which reduces the data size needed to processed. The search process is then accelerated by adopting a Hierarchical Navigable Small World (HNSW) method. Moreover, we optimize the time-consuming steps, such as distance calculation, of the HNSW via quantization, further reducing the time complexity of the construct and search process. By combining the use of granular-balls and quantization of the HNSW method, our approach manages to take advantage of these treatments and significantly reduces the time complexity of the $k$NN-like algorithms, as revealed by a comprehensive complexity analysis.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2505.23066

Country: Asia > China (0.15)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.30)

Add feedback

HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection

Neural Information Processing SystemsMay-27-2025, 13:38:45 GMT

Data serves as the fundamental basis for advancing deep learning. Therefore, exploring the methods for effectively using models like LLMs to generate synthetic tabular data, which is privacy-preserving but similar to original one, is urgent.In this paper, we introduce a new framework HARMONIC for tabular data generation and evaluation by LLMs. In the data generation part of our framework, we employ fine-tuning to generate tabular data and enhance privacy rather than continued pre-training which is often used by previous small-scale LLM-based methods. In particular, we construct an instruction fine-tuning dataset based on the idea of the k-nearest neighbors algorithm to inspire LLMs to discover inter-row relationships. By such fine-tuning, LLMs are trained to remember the format and connections of the data rather than the data itself, which reduces the risk of privacy leakage.

data synthesis and privacy protection, harmonic, tabular data synthesis, (4 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.97)

Add feedback

Multi-scale Consistency for Robust 3D Registration via Hierarchical Sinkhorn Tree

Neural Information Processing SystemsMay-27-2025, 11:46:21 GMT

We study the problem of retrieving accurate correspondence through multi-scale consistency (MSC) for robust point cloud registration. Existing works in a coarse-to-fine manner either suffer from severe noisy correspondences caused by unreliable coarse matching or struggle to form outlier-free coarse-level correspondence sets. To tackle this, we present Hierarchical Sinkhorn Tree (HST), a pruned tree structure designed to hierarchically measure the local consistency of each coarse correspondence across multiple feature scales, thereby filtering out the local dissimilar ones. In this way, we convert the modeling of MSC for each correspondence into a BFS traversal with pruning of a K-ary tree rooted at the superpoint, with its K nearest neighbors in the feature pyramid serving as child nodes. To achieve efficient pruning and accurate vicinity characterization, we further propose a novel overlap-aware Sinkhorn Distance, which retains only the most likely overlapping points for local measurement and next level exploration.

correspondence, hierarchical sinkhorn tree, multi-scale consistency, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.63)

Add feedback

Persistent Homology for High-dimensional Data Based on Spectral Methods

Neural Information Processing SystemsMay-27-2025, 00:17:10 GMT

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise.

artificial intelligence, high-dimensional data, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.65)

Add feedback

Meta-Neighborhoods

Neural Information Processing SystemsMay-26-2025, 20:24:11 GMT

Making an adaptive prediction based on input is an important ability for general artificial intelligence. In this work, we step forward in this direction and propose a semi-parametric method, Meta-Neighborhoods, where predictions are made adaptively to the neighborhood of the input. We show that Meta-Neighborhoods is a generalization of k-nearest-neighbors. Due to the simpler manifold structure around a local neighborhood, Meta-Neighborhoods represent the predictive distribution p(y x) more accurately. To reduce memory and computation overheads, we propose induced neighborhoods that summarize the training data into a much smaller dictionary.

artificial intelligence, machine learning, meta-neighborhood, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.68)

Add feedback

Regret Bounds for Multilabel Classification in Sparse Label Regimes

Neural Information Processing SystemsMay-26-2025, 19:53:28 GMT

Multi-label classification (MLC) has wide practical importance, but the theoretical understanding of its statistical properties is still limited. As an attempt to fill this gap, we thoroughly study upper and lower regret bounds for two canonical MLC performance measures, Hamming loss and Precision@ \kappa . We consider two different statistical and algorithmic settings, a non-parametric setting tackled by plug-in classifiers \ a la k -nearest neighbors, and a parametric one tackled by empirical risk minimization operating on surrogate loss functions. For both, we analyze the interplay between a natural MLC variant of the low noise assumption, widely studied in binary classification, and the label sparsity, the latter being a natural property of large-scale MLC problems. We show that those conditions are crucial in improving the bounds, but the way they are tangled is not obvious, and also different across the two settings.

artificial intelligence, machine learning, multilabel classification, (2 more...)

Neural Information Processing Systems

Country: North America > United States (0.10)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.65)

Add feedback

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations

Neural Information Processing SystemsMay-26-2025, 17:51:51 GMT

This paper introduces FUNGI, Features from UNsupervised GradIents, a method to enhance the features of transformer encoders by leveraging self-supervised gradients. Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio. Across backbones spanning various sizes and pretraining strategies, FUNGI features provide consistent performance improvements over the embeddings.

deep frozen representation, machine learning, natural language, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.65)

Add feedback