AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Active Nearest-Neighbor Learning in Metric Spaces

Neural Information Processing SystemsFeb-11-2025, 18:43:05 GMT

We propose a pool-based non-parametric active learning algorithm for general metric spaces, called MArgin Regularized Metric Active Nearest Neighbor (MARMANN), which outputs a nearest-neighbor classifier. We give prediction error guarantees that depend on the noisy-margin properties of the input sample, and are competitive with those obtained by previously proposed passive learners. We prove that the label complexity of MARMANN is significantly lower than that of any passive learner with similar error guarantees. Our algorithm is based on a generalized sample compression scheme and a new label-efficient active model-selection procedure.

active nearest-neighbor learning, artificial intelligence, machine learning, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.78)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.69)

Add feedback

ResMem: Learn what you can and memorize the rest

Neural Information Processing SystemsFeb-11-2025, 10:21:24 GMT

The impressive generalization performance of modern neural networks is attributed in part to their ability to implicitly memorize complex training patterns. Inspired by this, we explore a novel mechanism to improve model generalization via explicit memorization. Specifically, we propose the residual-memorization (ResMem) algorithm, a new method that augments an existing prediction model (e.g., a neural network) by fitting the model's residuals with a k-nearest neighbor based regressor. The final prediction is then the sum of the original model and the fitted residual regressor. By construction, ResMem can explicitly memorize the training labels, even when the base model has low capacity. We start by formulating a stylized linear regression problem and rigorously show that ResMem results in a more favorable test risk over a base linear neural network. Then, we empirically show that ResMem consistently improves the test set generalization of the original prediction model across standard vision and natural language processing benchmarks.

artificial intelligence, machine learning, resmem, (19 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report > New Finding (0.67)
Research Report > Promising Solution (0.48)

Industry: Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

A Multilabel Classification Framework for Approximate Nearest Neighbor Search

Neural Information Processing SystemsFeb-10-2025, 18:20:56 GMT

Both supervised and unsupervised machine learning algorithms have been used to learn partition-based index structures for approximate nearest neighbor (ANN) search. Existing supervised algorithms formulate the learning task as finding a partition in which the nearest neighbors of a training set point belong to the same partition element as the point itself, so that the nearest neighbor candidates can be retrieved by naive lookup or backtracking search. We formulate candidate set selection in ANN search directly as a multilabel classification problem where the labels correspond to the nearest neighbors of the query point, and interpret the partitions as partitioning classifiers for solving this task. Empirical results suggest that the natural classifier based on this interpretation leads to strictly improved performance when combined with any unsupervised or supervised partitioning strategy. We also prove a sufficient condition for consistency of a partitioning classifier for ANN search, and illustrate the result by verifying this condition for chronological k-d trees.

artificial intelligence, classifier, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.30)

Add feedback

Distributionally Robust Weighted k-Nearest Neighbors

Neural Information Processing SystemsFeb-10-2025, 06:30:22 GMT

Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing k-nearest neighbor (k-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted k-nearest neighbors, which aims to find the optimal weighted k-NN classifiers that hedge against feature uncertainties. We develop an algorithm, Dr.k-NN, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification. These weights are class-dependent, and are determined by the similarities of sample features under the least favorable scenarios. The proposed framework can be shown to be equivalent to a Lipschitz norm regularization problem.

artificial intelligence, classifier, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Diagnostic Medicine (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-9-2025, 21:54:44 GMT

This paper develops a new way of analyzing k-Nearest Neighbor prediction for classification problems. Nearest Neighbor prediction is a simple and well studied classification method that, although the first results date decades back, has recently seen some renewed interest and new development. This paper provides a new angle on Nearest Neighbor analysis by incorporating that the behavior of NN classifiers actually automatically adapt to local scaling (with respect to the probability distribution) of the input space. This aspect has not been taken into account in previous studies on NN. Previous analysis of NN has been in terms of smoothness parameters and bounds provided reflect some sort of worst case over the data space of this smoothness parameters.

artificial intelligence, author feedback and meta-review, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.60)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsFeb-9-2025, 17:09:40 GMT

The paper proposes a new approach to Mahalanobis metric learning which views metric learning as a structure-prediction problem. The structure we seek to establish is the set, h, of appropriate k-Nearest Neighbors (k-NNs) of an instance; appropriate in the sense that it will lead to a small classification error. The idea of the authors is quite elegant: in learning a metric for k-NN prediction we can have a correct prediction for an instance x_i as long as the majority of its k-NNs are of the same class as x_i . The metric learning algorithm they propose does exactly that, i.e. it looks for the Mahalanobis metric matrix W that leads to k-nearest neighborhoods for each x_i instance such that the majority of the instances of the neighborhood belongs to the correct class, i.e. that of x_i . Note here that there can be many such neighborhoods that predict the correct label for a given instance x_i, it is enough to prefer at least one of them over all the ones that predict the wrong label.

artificial intelligence, machine learning, neighborhood, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.77)

Add feedback

Behavioral Entropy-Guided Dataset Generation for Offline Reinforcement Learning

Suttle, Wesley A., Suresh, Aamodh, Nieto-Granda, Carlos

arXiv.org Artificial IntelligenceFeb-6-2025

Entropy-based objectives are widely used to perform state space exploration in reinforcement learning (RL) and dataset generation for offline RL. Behavioral entropy (BE), a rigorous generalization of classical entropies that incorporates cognitive and perceptual biases of agents, was recently proposed for discrete settings and shown to be a promising metric for robotic exploration problems. In this work, we propose using BE as a principled exploration objective for systematically generating datasets that provide diverse state space coverage in complex, continuous, potentially high-dimensional domains. To achieve this, we extend the notion of BE to continuous settings, derive tractable k-nearest neighbor estimators, provide theoretical guarantees for these estimators, and develop practical reward functions that can be used with standard RL methods to learn BE-maximizing policies. Using standard MuJoCo environments, we experimentally compare the performance of offline RL algorithms for a variety of downstream tasks on datasets generated using BE, Rényi, and Shannon entropy-maximizing policies, as well as the SMM and RND algorithms. We find that offline RL algorithms trained on datasets collected using BE outperform those trained on datasets collected using Shannon entropy, SMM, and RND on all tasks considered, and on 80% of the tasks compared to datasets collected using Rényi entropy. Reinforcement learning (RL) methods can successfully solve challenging tasks in complex environments, even outperforming humans in a variety of cases (Mnih et al., 2015; Silver et al., 2018).

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2502.04141

Country:

North America > United States (0.46)
Asia > Middle East > Israel (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Identifying Flaky Tests in Quantum Code: A Machine Learning Approach

Kaur, Khushdeep, Kim, Dongchan, Jamshidi, Ainaz, Zhang, Lei

arXiv.org Artificial IntelligenceFeb-6-2025

Testing and debugging quantum software pose significant challenges due to the inherent complexities of quantum mechanics, such as superposition and entanglement. One challenge is indeterminacy, a fundamental characteristic of quantum systems, which increases the likelihood of flaky tests in quantum programs. To the best of our knowledge, there is a lack of comprehensive studies on quantum flakiness in the existing literature. In this paper, we present a novel machine learning platform that leverages multiple machine learning models to automatically detect flaky tests in quantum programs. Our evaluation shows that the extreme gradient boosting and decision tree-based models outperform other models (i.e., random forest, k-nearest neighbors, and support vector machine), achieving the highest F1 score and Matthews Correlation Coefficient in a balanced dataset and an imbalanced dataset, respectively. Furthermore, we expand the currently limited dataset for researchers interested in quantum flaky tests. In the future, we plan to explore the development of unsupervised learning techniques to detect and classify quantum flaky tests more effectively. These advancements aim to improve the reliability and robustness of quantum software testing.

artificial intelligence, flaky test, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.04471

Country: North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.94)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.48)

Add feedback

A Novel Zero-Touch, Zero-Trust, AI/ML Enablement Framework for IoT Network Security

Shakya, Sushil, Abbas, Robert, Maric, Sasa

arXiv.org Artificial IntelligenceFeb-5-2025

The IoT facilitates a connected, intelligent, and sustainable society; therefore, it is imperative to protect the IoT ecosystem. The IoT-based 5G and 6G will leverage the use of machine learning and artificial intelligence (ML/AI) more to pave the way for autonomous and collaborative secure IoT networks. Zero-touch, zero-trust IoT security with AI and machine learning (ML) enablement frameworks offers a powerful approach to securing the expanding landscape of Internet of Things (IoT) devices. This paper presents a novel framework based on the integration of Zero Trust, Zero Touch, and AI/ML powered for the detection, mitigation, and prevention of DDoS attacks in modern IoT ecosystems. The focus will be on the new integrated framework by establishing zero trust for all IoT traffic, fixed and mobile 5G/6G IoT network traffic, and data security (quarantine-zero touch and dynamic policy enforcement). We perform a comparative analysis of five machine learning models, namely, XGBoost, Random Forest, K-Nearest Neighbors, Stochastic Gradient Descent, and Native Bayes, by comparing these models based on accuracy, precision, recall, F1-score, and ROC-AUC. Results show that the best performance in detecting and mitigating different DDoS vectors comes from the ensemble-based approaches.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2502.03614

Country: Oceania > Australia > New South Wales (0.14)

Genre: Research Report (0.70)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)

Add feedback

Variance-Adjusted Cosine Distance as Similarity Metric

Sahoo, Satyajeet, Maiti, Jhareswar

arXiv.org Machine LearningFeb-4-2025

Cosine similarity is a popular distance measure that measures the similarity between two vectors in the inner product space. It is widely used in many data classification algorithms like K-Nearest Neighbors, Clustering etc. This study demonstrates limitations of application of cosine similarity. Particularly, this study demonstrates that traditional cosine similarity metric is valid only in the Euclidean space, whereas the original data resides in a random variable space. When there is variance and correlation in the data, then cosine distance is not a completely accurate measure of similarity. While new similarity and distance metrics have been developed to make up for the limitations of cosine similarity, these metrics are used as substitutes to cosine distance, and do not make modifications to cosine distance to overcome its limitations. Subsequently, we propose a modified cosine similarity metric, where cosine distance is adjusted by variance-covariance of the data. Application of variance-adjusted cosine distance gives better similarity performance compared to traditional cosine distance. KNN modelling on the Wisconsin Breast Cancer Dataset is performed using both traditional and modified cosine similarity measures and compared. The modified formula shows 100% test accuracy on the data.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

2502.02233

Country:

North America > United States > Wisconsin (0.26)
Asia > China (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Oncology > Breast Cancer (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback