AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

Structure-Guided Input Graph for GNNs facing Heterophily

Tenorio, Victor M., Navarro, Madeline, Rey, Samuel, Segarra, Santiago, Marques, Antonio G.

arXiv.org Artificial IntelligenceDec-2-2024

Graph Neural Networks (GNNs) have emerged as a promising tool to handle data exhibiting an irregular structure. However, most GNN architectures perform well on homophilic datasets, where the labels of neighboring nodes are likely to be the same. In recent years, an increasing body of work has been devoted to the development of GNN architectures for heterophilic datasets, where labels do not exhibit this low-pass behavior. In this work, we create a new graph in which nodes are connected if they share structural characteristics, meaning a higher chance of sharing their labels, and then use this new graph in the GNN architecture. To do this, we compute the k-nearest neighbors graph according to distances between structural features, which are either (i) role-based, such as degree, or (ii) global, such as centrality measures. Experiments show that the labels are smoother in this newly defined graph and that the performance of GNN architectures improves when using this alternative structure.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.01757

Country: North America > United States (0.49)

Genre: Research Report (0.64)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

WaKA: Data Attribution using K-Nearest Neighbors and Membership Privacy Principles

Mesana, Patrick, Bénesse, Clément, Lautraite, Hadrien, Caporossi, Gilles, Gambs, Sébastien

arXiv.org Artificial IntelligenceDec-1-2024

In this paper, we introduce WaKA (Wasserstein K-nearest-neighbors Attribution), a novel attribution method that leverages principles from the LiRA (Likelihood Ratio Attack) framework and k-nearest neighbors classifiers (k-NN). WaKA efficiently measures the contribution of individual data points to the model's loss distribution, analyzing every possible k-NN that can be constructed using the training set, without requiring to sample subsets of the training set. WaKA is versatile and can be used a posteriori as a membership inference attack (MIA) to assess privacy risks or a priori for privacy influence measurement and data valuation. Thus, WaKA can be seen as bridging the gap between data attribution and membership inference attack (MIA) by providing a unified framework to distinguish between a data point's value and its privacy risk. For instance, we have shown that self-attribution values are more strongly correlated with the attack success rate than the contribution of a point to the model generalization. WaKA's different usage were also evaluated across diverse real-world datasets, demonstrating performance very close to LiRA when used as an MIA on k-NN classifiers, but with greater computational efficiency. Additionally, WaKA shows greater robustness than Shapley Values for data minimization tasks (removal or addition) on imbalanced datasets.

artificial intelligence, machine learning, survey article, (16 more...)

arXiv.org Artificial Intelligence

2411.01357

Country:

North America > United States (0.46)
North America > Canada > Quebec (0.29)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

K-GBS3FCM -- KNN Graph-Based Safe Semi-Supervised Fuzzy C-Means

Santos, Gabriel, Julia, Rita, Nascimento, Marcelo

arXiv.org Artificial IntelligenceNov-21-2024

Clustering data using prior domain knowledge, starting from a partially labeled set, has recently been widely investigated. Often referred to as semi-supervised clustering, this approach leverages labeled data to enhance clustering accuracy. To maximize algorithm performance, it is crucial to ensure the safety of this prior knowledge. Methods addressing this concern are termed safe semi-supervised clustering (S3C) algorithms. This paper introduces the KNN graph-based safety-aware semi-supervised fuzzy c-means algorithm (K-GBS3FCM), which dynamically assesses neighborhood relationships between labeled and unlabeled data using the K-Nearest Neighbors (KNN) algorithm. This approach aims to optimize the use of labeled data while minimizing the adverse effects of incorrect labels. Additionally, it is proposed a mechanism that adjusts the influence of labeled data on unlabeled ones through regularization parameters and the average safety degree. Experimental results on multiple benchmark datasets demonstrate that the graph-based approach effectively leverages prior knowledge to enhance clustering accuracy. The proposed method was significantly superior in 64% of the 56 test configurations, obtaining higher levels of clustering accuracy when compared to other semi-supervised and traditional unsupervised methods. This research highlights the potential of integrating graph-based approaches, such as KNN, with established techniques to develop advanced clustering algorithms, offering significant applications in fields that rely on both labeled and unlabeled data for more effective clustering.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2411.14728

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.87)

Add feedback

One-Layer Transformer Provably Learns One-Nearest Neighbor In Context

Li, Zihao, Cao, Yuan, Gao, Cheng, He, Yihan, Liu, Han, Klusowski, Jason M., Fan, Jianqing, Wang, Mengdi

arXiv.org Artificial IntelligenceNov-16-2024

Transformers have achieved great success in recent years. Interestingly, transformers have shown particularly strong in-context learning capability -- even without fine-tuning, they are still able to solve unseen tasks well purely based on task-specific prompts. In this paper, we study the capability of one-layer transformers in learning one of the most classical nonparametric estimators, the one-nearest neighbor prediction rule. Under a theoretical framework where the prompt contains a sequence of labeled training data and unlabeled test data, we show that, although the loss function is nonconvex when trained with gradient descent, a single softmax attention layer can successfully learn to behave like a one-nearest neighbor classifier. Our result gives a concrete example of how transformers can be trained to implement nonparametric machine learning algorithms, and sheds light on the role of softmax attention in transformer models.

artificial intelligence, machine learning, transformer, (16 more...)

arXiv.org Artificial Intelligence

2411.1083

Country: Asia (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Add feedback

Partial Multi-View Clustering via Meta-Learning and Contrastive Feature Alignment

Chen, BoHao

arXiv.org Artificial IntelligenceNov-14-2024

Partial multi-view clustering (PVC) presents significant challenges practical research problem for data analysis in real-world applications, especially when some views of the data are partially missing. Existing clustering methods struggle to handle incomplete views effectively, leading to suboptimal clustering performance. In this paper, we propose a novel dual optimization framework based on contrastive learning, which aims to maximize the consistency of latent features in incomplete multi-view data and improve clustering performance through deep learning models. By combining a fine-tuned Vision Transformer and k-nearest neighbors (KNN), we fill in missing views and dynamically adjust view weights using self-supervised learning and meta-learning. Experimental results demonstrate that our framework outperforms state-of-the-art clustering models on the BDGP and HW datasets, particularly in handling complex and incomplete multi-view data.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2411.09758

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

A Comparative Analysis of Machine Learning Models for DDoS Detection in IoT Networks

Shakya, Sushil, Abbas, Robert

arXiv.org Artificial IntelligenceNov-8-2024

This paper presents the detection of DDoS attacks in IoT networks using machine learning models. Their rapid growth has made them highly susceptible to various forms of cyberattacks, many of whose security procedures are implemented in an irregular manner. It evaluates the efficacy of different machine learning models, such as XGBoost, K-Nearest Neighbours, Stochastic Gradient Descent, and Na\"ive Bayes, in detecting DDoS attacks from normal network traffic. Each model has been explained on several performance metrics, such as accuracy, precision, recall, and F1-score to understand the suitability of each model in real-time detection and response against DDoS threats. This comparative analysis will, therefore, enumerate the unique strengths and weaknesses of each model with respect to the IoT environments that are dynamic and hence moving in nature. The effectiveness of these models is analyzed, showing how machine learning can greatly enhance IoT security frameworks, offering adaptive, efficient, and reliable DDoS detection capabilities. These findings have shown the potential of machine learning in addressing the pressing need for robust IoT security solutions that can mitigate modern cyber threats and assure network integrity.

artificial intelligence, detection, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2411.0589

Genre: Research Report (0.50)

Industry:

Information Technology > Security & Privacy (1.00)
Government > Military > Cyberwarfare (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.36)

Add feedback

Enhancing binary classification: A new stacking method via leveraging computational geometry

Wu, Wei, Tang, Liang, Zhao, Zhongjie, Teo, Chung-Piaw

arXiv.org Artificial IntelligenceOct-30-2024

Binary classification is a fundamental task in machine learning and data science, with applications spanning numerous domains, including spam detection, medical diagnostics, image recognition, credit scoring. The goal is to predict a binary outcome--typically labeled as 0 or 1--based on a set of input features. Various machine learning algorithms, such as logistic regression (LR), k-nearest neighbors (kNN), support vector machines (SVM), and neural network (NN), are commonly employed for binary classification tasks. These algorithms can be mainly divided into two categories: those with interpretability, which are convenient for analysis and control (e.g., LR); and those without interpretability but with potentially good classification performance (e.g., NN). Ensemble learning, a powerful technique in predictive modeling, has gained widespread recognition for its ability to improve model performance by combining the strengths of multiple learning algorithms [1]. Among ensemble methods, stacking stands out by integrating the predictions of diverse base models (different learning algorithms) through a meta-model, resulting in enhanced prediction accuracy compared to only using the best base model [2]. Stacking has demonstrated significant applications in classification problems such as network intrusion detection [3, 4], cancer type classification [5], credit lending [6], and protein-protein binding affinity prediction [7].

artificial intelligence, base model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2410.22722

Country: Asia (0.28)

Genre: Research Report > New Finding (0.34)

Industry:

Information Technology > Security & Privacy (0.34)
Health & Medicine > Therapeutic Area > Oncology (0.34)
Banking & Finance > Credit (0.34)
Health & Medicine > Diagnostic Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Adversarial Examples for k-Nearest Neighbor Classifiers Based on Higher-Order Voronoi Diagrams

Neural Information Processing SystemsOct-11-2024, 13:05:13 GMT

Adversarial examples are a widely studied phenomenon in machine learning models. While most of the attention has been focused on neural networks, other practical models also suffer from this issue. In this work, we propose an algorithm for evaluating the adversarial robustness of k -nearest neighbor classification, i.e., finding a minimum-norm adversarial example. Diverging from previous proposals, we propose the first geometric approach by performing a search that expands outwards from a given input point. On a high level, the search radius expands to the nearby higher-order Voronoi cells until we find a cell that classifies differently from the input point.

artificial intelligence, k-nearest neighbor classifier, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

TPU-KNN: K Nearest Neighbor Search at Peak FLOP/s

Neural Information Processing SystemsOct-11-2024, 08:23:33 GMT

This paper presents a novel nearest neighbor search algorithm achieving TPU (Google Tensor Processing Unit) peak performance, outperforming state-of-the-art GPU algorithms with similar level of recall. The design of the proposed algorithm is motivated by an accurate accelerator performance model that takes into account both the memory and instruction bottlenecks. Our algorithm comes with an analytical guarantee of recall in expectation and does not require maintaining sophisticated index data structure or tuning, making it suitable for applications with frequent updates. Our work is available in the open-source package of Jax and Tensorflow on TPU.

artificial intelligence, machine learning, nearest neighbor search, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.40)

Add feedback

On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Neural Information Processing SystemsOct-10-2024, 19:28:26 GMT

The k-Nearest Neighbors (kNN) classifier is a fundamental non-parametric machine learning algorithm. However, it is well known that it suffers from the curse of dimensionality, which is why in practice one often applies a kNN classifier on top of a (pre-trained) feature transformation. From a theoretical perspective, most, if not all theoretical results aimed at understanding the kNN classifier are derived for the raw feature space. This leads to an emerging gap between our theoretical understanding of kNN and its practical applications. In this paper, we take a first step towards bridging this gap.

artificial intelligence, classifier, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.74)

Add feedback