AITopics | nearest neighbor graph

Researchers have used nearest neighbor graphs to transform classical machine learning problems on tabular data into node classification tasks to solve with graph representation learning methods. Such artificial structures often reflect the homophily assumption, believed to be a key factor in the performances of deep graph networks. In light of recent results demystifying these beliefs, we introduce a theoretical framework to understand the benefits of Nearest Neighbor (NN) graphs when a graph structure is missing. We formally analyze the Cross-Class Neighborhood Similarity (CCNS), used to empirically evaluate the usefulness of structures, in the context of nearest neighbor graphs. Moreover, we study the class separability induced by deep graph networks on a k-NN graph. Motivated by the theory, our quantitative experiments demonstrate that, under full supervision, employing a k-NN graph offers no benefits compared to a structure-agnostic baseline. Qualitative analyses suggest that our framework is good at estimating the CCNS and hint at k-NN graphs never being useful for such classification tasks under full supervision, thus advocating for the study of alternative graph construction techniques in combination with deep graph networks.

class distribution induced, nearest neighbor graph, node classification, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning Nearest Neighbor Graphs from Noisy Distance Samples

Blake Mason, Ardhendu Tripathy, Robert Nowak

Neural Information Processing SystemsOct-3-2025, 06:56:23 GMT

In this paper, we propose an active algorithm to find the graph with high probability and analyze its query complexity.

algorithm, nearest neighbor graph, triangle inequality, (10 more...)

Neural Information Processing Systems

Country:

North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > Canada (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.46)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.34)

Add feedback

EmbedOR: Provable Cluster-Preserving Visualizations with Curvature-Based Stochastic Neighbor Embeddings

Saidi, Tristan Luca, Hickok, Abigail, Rieck, Bastian, Blumberg, Andrew J.

arXiv.org Artificial IntelligenceSep-5-2025

Stochastic Neighbor Embedding (SNE) algorithms like UMAP and tSNE often produce visualizations that do not preserve the geometry of noisy and high dimensional data. In particular, they can spuriously separate connected components of the underlying data submanifold and can fail to find clusters in well-clusterable data. To address these limitations, we propose EmbedOR, a SNE algorithm that incorporates discrete graph curvature. Our algorithm stochastically embeds the data using a curvature-enhanced distance metric that emphasizes underlying cluster structure. Critically, we prove that the EmbedOR distance metric extends consistency results for tSNE to a much broader class of datasets. We also describe extensive experiments on synthetic and real data that demonstrate the visualization and geometry-preservation capabilities of EmbedOR. We find that, unlike other SNE algorithms and UMAP, EmbedOR is much less likely to fragment continuous, high-density regions of the data. Finally, we demonstrate that the EmbedOR distance metric can be used as a tool to annotate existing visualizations to identify fragmentation and provide deeper insight into the underlying geometry of the data.

artificial intelligence, data mining, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2509.03703

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Meta-Instance Selection. Instance Selection as a Classification Problem with Meta-Features

Blachnik, Marcin, Ciepliński, Piotr

arXiv.org Artificial IntelligenceJan-20-2025

Data pruning, or instance selection, is an important problem in machine learning especially in terms of nearest neighbour classifier. However, in data pruning which speeds up the prediction phase, there is an issue related to the speed and efficiency of the process itself. In response, the study proposes an approach involving transforming the instance selection process into a classification task conducted in a unified meta-feature space where each instance can be classified and assigned to either the "to keep" or "to remove" class. This approach requires training an appropriate meta-classifier, which can be developed based on historical instance selection results from other datasets using reference instance selection methods as a labeling tool. This work proposes constructing the meta-feature space based on properties extracted from the nearest neighbor graph. Experiments conducted on 17 datasets of varying sizes and five reference instance selection methods (ENN, Drop3, ICF, HMN-EI, and CCIS) demonstrate that the proposed solution achieves results comparable to reference instance selection methods while significantly reducing computational complexity. In the proposed approach, the computational complexity of the system depends only on identifying the k-nearest neighbors for each data sample and running the meta-classifier. Additionally, the scaling law turns into the requirement of huge compute power both during training and prediction which is not always applicable in real live scenarios where the compute resources are limited. In that case, both the training data and the prediction model should require small computing resources. Therefore the training set should ensure a possible small size but keep the prediction accuracy of the original training set. This issue is not new, along with its development has been started primarily for the nearest neighbor classifier under the name of instance selection. Thus, already in the late 1960s and early 1970s, algorithms such as Condensed Nearest Neighbor (CNN), Edited Nearest Neighbor (ENN), and many others were developed. The benchmarks of instance selection indicate the Drop3 [3] and ICF [4] algorithms as the most wildly used, which, despite not being new, are characterized by excellent properties in terms of the balance between the prediction accuracy of the kNN algorithm and the reduction of the size of the stored set of prototypes (reduction_rate) [5]. These algorithms are also applicable not only as elements of the learning process for the kNN algorithm (prototype selection as part of the learning process) and hence could also be used as universal algorithms for reducing the size of the training set for any classifier, thereby accelerating the learning process of complex predictive models, the process of finding optimal parameters, etc. Examples of such applications can be found in [6, 7] or in [5].

artificial intelligence, data quality, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2501.11526

Country:

North America > United States (0.04)
Europe > Poland (0.04)
Europe > France (0.04)
Asia (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Quality > Instance Selection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data

Neural Information Processing SystemsJan-18-2025, 15:27:18 GMT

Researchers have used nearest neighbor graphs to transform classical machine learning problems on tabular data into node classification tasks to solve with graph representation learning methods. Such artificial structures often reflect the homophily assumption, believed to be a key factor in the performances of deep graph networks. In light of recent results demystifying these beliefs, we introduce a theoretical framework to understand the benefits of Nearest Neighbor (NN) graphs when a graph structure is missing. We formally analyze the Cross-Class Neighborhood Similarity (CCNS), used to empirically evaluate the usefulness of structures, in the context of nearest neighbor graphs. Moreover, we study the class separability induced by deep graph networks on a k-NN graph.

class distribution induced, nearest neighbor graph, node classification, (4 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Recovering Manifold Structure Using Ollivier-Ricci Curvature

Saidi, Tristan Luca, Hickok, Abigail, Blumberg, Andrew J.

arXiv.org Artificial IntelligenceOct-1-2024

We introduce ORC-ManL, a new algorithm to prune spurious edges from nearest neighbor graphs using a criterion based on Ollivier-Ricci curvature and estimated metric distortion. Our motivation comes from manifold learning: we show that when the data generating the nearest-neighbor graph consists of noisy samples from a low-dimensional manifold, edges that shortcut through the ambient space have more negative Ollivier-Ricci curvature than edges that lie along the data manifold. We demonstrate that our method outperforms alternative pruning methods and that it significantly improves performance on many downstream geometric data analysis tasks that use nearest neighbor graphs as input. Specifically, we evaluate on manifold learning, persistent homology, dimension estimation, and others. We also show that ORC-ManL can be used to improve clustering and manifold learning of single-cell RNA sequencing data. Finally, we provide empirical convergence experiments that support our theoretical findings.

curvature, manifold, shortcut edge, (14 more...)

arXiv.org Artificial Intelligence

2410.01149

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China (0.04)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Anomaly Detection with Score functions based on Nearest Neighbor Graphs

Neural Information Processing SystemsApr-6-2023, 13:58:01 GMT

We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on n-point nominal data. Anomalies are declared whenever the score of a test sample falls below q, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, q, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality.

anomaly detection, nearest neighbor graph, score function, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Anomaly Detection with Score functions based on Nearest Neighbor Graphs

Zhao, Manqi, Saligrama, Venkatesh

Neural Information Processing SystemsFeb-15-2020, 04:12:36 GMT

We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on n-point nominal data. Anomalies are declared whenever the score of a test sample falls below q, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, q, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality.

anomaly detection, nearest neighbor graph, score function, (2 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

Filters

Collaborating Authors

nearest neighbor graph

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Learning Nearest Neighbor Graphs from Noisy Distance Samples

fdf1bc5669e8ff5ba45d02fded729feb-Paper.pdf

On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data

Learning Nearest Neighbor Graphs from Noisy Distance Samples

EmbedOR: Provable Cluster-Preserving Visualizations with Curvature-Based Stochastic Neighbor Embeddings

Meta-Instance Selection. Instance Selection as a Classification Problem with Meta-Features

On Class Distributions Induced by Nearest Neighbor Graphs for Node Classification of Tabular Data

Recovering Manifold Structure Using Ollivier-Ricci Curvature

Anomaly Detection with Score functions based on Nearest Neighbor Graphs

Anomaly Detection with Score functions based on Nearest Neighbor Graphs