AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

ML-Based Bidding Price Prediction for Pay-As-Bid Ancillary Services Markets: A Use Case in the German Control Reserve Market

Bezold, Vincent, Baur, Lukas, Sauer, Alexander

arXiv.org Machine LearningMar-21-2025

The increasing integration of renewable energy sources has led to greater volatility and unpredictability in electricity generation, posing challenges to grid stability. Ancillary service markets, such as the German control reserve market, allow industrial consumers and producers to offer flexibility in their power consumption or generation, contributing to grid stability while earning additional income. However, many participants use simple bidding strategies that may not maximize their revenues. This paper presents a methodology for forecasting bidding prices in pay-as-bid ancillary service markets, focusing on the German control reserve market. We evaluate various machine learning models, including Support Vector Regression, Decision Trees, and k-Nearest Neighbors, and compare their performance against benchmark models. To address the asymmetry in the revenue function of pay-as-bid markets, we introduce an offset adjustment technique that enhances the practical applicability of the forecasting models. Our analysis demonstrates that the proposed approach improves potential revenues by 27.43 % to 37.31 % compared to baseline models. When analyzing the relationship between the model forecasting errors and the revenue, a negative correlation is measured for three markets; according to the results, a reduction of 1 EUR/MW model price forecasting error (MAE) statistically leads to a yearly revenue increase between 483 EUR/MW and 3,631 EUR/MW. The proposed methodology enables industrial participants to optimize their bidding strategies, leading to increased earnings and contributing to the efficiency and stability of the electrical grid.

artificial intelligence, machine learning, market, (4 more...)

arXiv.org Machine Learning

2503.17214

Genre: Research Report (0.69)

Industry:

Energy > Renewable (0.73)
Banking & Finance > Trading (0.60)
Energy > Power Industry (0.53)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.53)

Add feedback

2D-OOB: Attributing Data Contribution Through Joint Valuation Framework

Neural Information Processing SystemsMar-20-2025, 14:26:13 GMT

Data valuation has emerged as a powerful framework for quantifying each datum's contribution to the training of a machine learning model. However, it is crucial to recognize that the quality of cells within a single data point can vary greatly in practice. For example, even in the case of an abnormal data point, not all cells are necessarily noisy. The single scalar score assigned by existing data valuation methods blurs the distinction between noisy and clean cells of a data point, making it challenging to interpret the data values. In this paper, we propose 2D-OOB, an out-of-bag estimation framework for jointly determining helpful (or detrimental) samples as well as the particular cells that drive them. Our comprehensive experiments demonstrate that 2D-OOB achieves state-of-the-art performance across multiple use cases while being exponentially faster. Specifically, 2D-OOB shows promising results in detecting and rectifying fine-grained outliers at the cell level, and localizing backdoor triggers in data poisoning attacks.

artificial intelligence, data mining, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois (0.14)
North America > United States > New York (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology > Security & Privacy (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Persistent Homology for High-dimensional Data Based on Spectral Methods Sebastian Damrich Hertie Institute for AI in Brain Health, University of Tübingen, Germany

Neural Information Processing SystemsMar-20-2025, 08:43:43 GMT

Persistent homology is a popular computational tool for analyzing the topology of point clouds, such as the presence of loops or voids. However, many real-world datasets with low intrinsic dimensionality reside in an ambient space of much higher dimensionality. We show that in this case traditional persistent homology becomes very sensitive to noise and fails to detect the correct topology. The same holds true for existing refinements of persistent homology. As a remedy, we find that spectral distances on the k-nearest-neighbor graph of the data, such as diffusion distance and effective resistance, allow to detect the correct topology even in the presence of high-dimensional noise. Moreover, we derive a novel closed-form formula for effective resistance, and describe its relation to diffusion distances. Finally, we apply these methods to high-dimensional single-cell RNA-sequencing data and show that spectral distances allow robust detection of cell cycle loops.

artificial intelligence, machine learning, persistent homology, (15 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.50)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Embedding Dimension of Contrastive Learning and k-Nearest Neighbors

Neural Information Processing SystemsMar-20-2025, 07:24:10 GMT

We study the embedding dimension of distance comparison data in two settings: contrastive learning and k-nearest neighbors (k-NN).

artificial intelligence, dimension, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.92)
North America > United States > California (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.60)

Add feedback

No Train, all Gain: Self-Supervised Gradients Improve Deep Frozen Representations Walter Simoncini 1, Andrei Bursuc

Neural Information Processing SystemsMar-18-2025, 19:19:25 GMT

Our method is simple: given any pretrained model, we first compute gradients from various self-supervised objectives for each input. These gradients are projected to a lower dimension and then concatenated with the model's output embedding. The resulting features are evaluated on k-nearest neighbor classification over 11 datasets from vision, 5 from natural language processing, and 2 from audio.

data mining, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Europe > France (0.14)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(4 more...)

Add feedback

Continuous Partitioning for Graph-Based Semi-Supervised Learning Chester Holtz

Neural Information Processing SystemsMar-18-2025, 05:29:57 GMT

Laplace learning algorithms for graph-based semi-supervised learning have been shown to suffer from degeneracy at low label rates and in imbalanced class regimes. Here, we propose CutSSL: a framework for graph-based semi-supervised learning based on continuous nonconvex quadratic programming, which provably obtains integer solutions. Our framework is naturally motivated by an exact quadratic relaxation of a cardinality-constrained minimum-cut graph partitioning problem. Furthermore, we show our formulation is related to an optimization problem whose approximate solution is the mean-shifted Laplace learning heuristic, thus providing new insight into the performance of this heuristic. We demonstrate that CutSSL significantly surpasses the current state-of-the-art on k-nearest neighbor graphs and large real-world graph benchmarks across a variety of label rates, class imbalance, and label imbalance regimes.

artificial intelligence, laplace, machine learning, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California > San Diego County (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Banking & Finance (0.68)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection

Neural Information Processing SystemsMar-17-2025, 16:07:12 GMT

Data serves as the fundamental basis for advancing deep learning. Therefore, exploring the methods for effectively using models like LLMs to generate synthetic tabular data, which is privacy-preserving but similar to original one, is urgent.In this paper, we introduce a new framework HARMONIC for tabular data generation and evaluation by LLMs. In the data generation part of our framework, we employ fine-tuning to generate tabular data and enhance privacy rather than continued pre-training which is often used by previous small-scale LLM-based methods. In particular, we construct an instruction fine-tuning dataset based on the idea of the k-nearest neighbors algorithm to inspire LLMs to discover inter-row relationships. By such fine-tuning, LLMs are trained to remember the format and connections of the data rather than the data itself, which reduces the risk of privacy leakage.

large language model, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Industry: Information Technology > Security & Privacy (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.60)

Add feedback

Multi-scale Consistency for Robust 3D Registration via Hierarchical Sinkhorn Tree

Neural Information Processing SystemsMar-17-2025, 12:48:35 GMT

We study the problem of retrieving accurate correspondence through multi-scale consistency (MSC) for robust point cloud registration. Existing works in a coarse-to-fine manner either suffer from severe noisy correspondences caused by unreliable coarse matching or struggle to form outlier-free coarse-level correspondence sets. To tackle this, we present Hierarchical Sinkhorn Tree (HST), a pruned tree structure designed to hierarchically measure the local consistency of each coarse correspondence across multiple feature scales, thereby filtering out the local dissimilar ones. In this way, we convert the modeling of MSC for each correspondence into a BFS traversal with pruning of a K-ary tree rooted at the superpoint, with its K nearest neighbors in the feature pyramid serving as child nodes. To achieve efficient pruning and accurate vicinity characterization, we further propose a novel overlap-aware Sinkhorn Distance, which retains only the most likely overlapping points for local measurement and next level exploration.

artificial intelligence, correspondence, machine learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.63)

Add feedback

Embedding Dimension of Contrastive Learning and k -Nearest Neighbors

Neural Information Processing SystemsMar-16-2025, 15:49:21 GMT

We study the embedding dimension of distance comparison data in two settings: contrastive learning and k -nearest neighbors ( k -NN). In both cases, the goal is to find the smallest dimension d of an \ell_p -space in which a given dataset can be represented. We show that the arboricity of the associated graphs plays a key role in designing embeddings. Using this approach, for the most frequently used \ell_2 -distance, we get matching upper and lower bounds in both settings. In contrastive learning, we are given m labeled samples of the form (x_i, y_i, z_i -) representing the fact that the positive example y_i is closer to the anchor x_i than the negative example z_i .

artificial intelligence, machine learning, tilde omega, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.63)

Add feedback

Bags of Projected Nearest Neighbours: Competitors to Random Forests?

Hofmeyr, David P.

arXiv.org Machine LearningMar-12-2025

In this paper we introduce a simple and intuitive adaptive k nearest neighbours classifier, and explore its utility within the context of bootstrap aggregating ("bagging"). The approach is based on finding discriminant subspaces which are computationally efficient to compute, and are motivated by enhancing the discrimination of classes through nearest neighbour classifiers. This adaptiveness promotes diversity of the individual classifiers fit across different bootstrap samples, and so further leverages the variance reducing effect of bagging. Extensive experimental results are presented documenting the strong performance of the proposed approach in comparison with Random Forest classifiers, as well as other nearest neighbours based ensembles from the literature, plus other relevant benchmarks. Code to implement the proposed approach is available in the form of an R package from https://github.com/DavidHofmeyr/BOPNN.

artificial intelligence, decision tree learning, machine learning, (20 more...)

arXiv.org Machine Learning

2503.09651

Country: Europe > Austria > Vienna (0.14)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback