Goto

Collaborating Authors

 Nearest Neighbor Methods


Meta-Instance Selection. Instance Selection as a Classification Problem with Meta-Features

arXiv.org Artificial Intelligence

Data pruning, or instance selection, is an important problem in machine learning especially in terms of nearest neighbour classifier. However, in data pruning which speeds up the prediction phase, there is an issue related to the speed and efficiency of the process itself. In response, the study proposes an approach involving transforming the instance selection process into a classification task conducted in a unified meta-feature space where each instance can be classified and assigned to either the "to keep" or "to remove" class. This approach requires training an appropriate meta-classifier, which can be developed based on historical instance selection results from other datasets using reference instance selection methods as a labeling tool. This work proposes constructing the meta-feature space based on properties extracted from the nearest neighbor graph. Experiments conducted on 17 datasets of varying sizes and five reference instance selection methods (ENN, Drop3, ICF, HMN-EI, and CCIS) demonstrate that the proposed solution achieves results comparable to reference instance selection methods while significantly reducing computational complexity. In the proposed approach, the computational complexity of the system depends only on identifying the k-nearest neighbors for each data sample and running the meta-classifier. Additionally, the scaling law turns into the requirement of huge compute power both during training and prediction which is not always applicable in real live scenarios where the compute resources are limited. In that case, both the training data and the prediction model should require small computing resources. Therefore the training set should ensure a possible small size but keep the prediction accuracy of the original training set. This issue is not new, along with its development has been started primarily for the nearest neighbor classifier under the name of instance selection. Thus, already in the late 1960s and early 1970s, algorithms such as Condensed Nearest Neighbor (CNN), Edited Nearest Neighbor (ENN), and many others were developed. The benchmarks of instance selection indicate the Drop3 [3] and ICF [4] algorithms as the most wildly used, which, despite not being new, are characterized by excellent properties in terms of the balance between the prediction accuracy of the kNN algorithm and the reduction of the size of the stored set of prototypes (reduction_rate) [5]. These algorithms are also applicable not only as elements of the learning process for the kNN algorithm (prototype selection as part of the learning process) and hence could also be used as universal algorithms for reducing the size of the training set for any classifier, thereby accelerating the learning process of complex predictive models, the process of finding optimal parameters, etc. Examples of such applications can be found in [6, 7] or in [5].


Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding

Neural Information Processing Systems

The real-world deployment of an autonomous driving system requires its components to run on-board and in real-time, including the motion prediction module that predicts the future trajectories of surrounding traffic participants. Existing agent-centric methods have demonstrated outstanding performance on public benchmarks. However, they suffer from high computational overhead and poor scalability as the number of agents to be predicted increases. To address this problem, we introduce the K-nearest neighbor attention with relative pose encoding (KNARPE), a novel attention mechanism allowing the pairwise-relative representation to be used by Transformers. Then, based on KNARPE we present the Heterogeneous Polyline Transformer with Relative pose encoding (HPTR), a hierarchical framework enabling asynchronous token update during the online inference.


Unlimiformer: Long-Range Transformers with Unlimited Length Input

Neural Information Processing Systems

Since the proposal of transformers, these models have been limited to bounded input lengths, because of their need to attend to every token in the input. In this work, we propose Unlimiformer: a general approach that wraps any existing pretrained encoder-decoder transformer, and offloads the cross-attention computation to a single k -nearest-neighbor ( k NN) index, while the returned k NN distances are the attention dot-product scores. This k NN index can be kept on either the GPU or CPU memory and queried in sub-linear time; this way, we can index practically unlimited input sequences, while every attention head in every decoder layer retrieves its top- k keys, instead of attending to every key. We evaluate Unlimiformer on several long-document and book-summarization benchmarks, showing that it can process even **500k** token-long inputs from the BookSum dataset, without any input truncation at test time. We demonstrate that Unlimiformer improves pretrained models such as BART and Longformer by extending them to unlimited inputs without additional learned weights and without modifying their code.


Distributionally robust weighted k-nearest neighbors

Neural Information Processing Systems

Learning a robust classifier from a few samples remains a key challenge in machine learning. A major thrust of research has been focused on developing k-nearest neighbor (k-NN) based algorithms combined with metric learning that captures similarities between samples. When the samples are limited, robustness is especially crucial to ensure the generalization capability of the classifier. In this paper, we study a minimax distributionally robust formulation of weighted k-nearest neighbors, which aims to find the optimal weighted k-NN classifiers that hedge against feature uncertainties. We develop an algorithm, Dr.k-NN, that efficiently solves this functional optimization problem and features in assigning minimax optimal weights to training samples when performing classification.


Discriminative Metric Learning by Neighborhood Gerrymandering

Neural Information Processing Systems

We formulate the problem of metric learning for k nearest neighbor classification as a large margin structured prediction problem, with a latent variable representing the choice of neighbors and the task loss directly corresponding to classification error. We describe an efficient algorithm for exact loss augmented inference,and a fast gradient descent algorithm for learning in this model. The objective drives the metric to establish neighborhood boundaries that benefit the true class labels for the training points. Our approach, reminiscent of gerrymandering (redrawing of political boundaries to provide advantage to certain parties), is more direct in its handling of optimizing classification accuracy than those previously proposed. In experiments on a variety of data sets our method is shown to achieve excellent results compared to current state of the art in metric learning.


K-Nearest-Neighbor Local Sampling Based Conditional Independence Testing

Neural Information Processing Systems

Conditional independence (CI) testing is a fundamental task in statistics and machine learning, but its effectiveness is hindered by the challenges posed by high-dimensional conditioning variables and limited data samples. This article introduces a novel testing approach to address these challenges and enhance control of the type I error while achieving high power under alternative hypotheses. The proposed approach incorporates a computationally efficient classifier-based conditional mutual information (CMI) estimator, capable of capturing intricate dependence structures among variables. To approximate a distribution encoding the null hypothesis, a k -nearest-neighbor local sampling strategy is employed. An important advantage of this approach is its ability to operate without assumptions about distribution forms or feature dependencies. Furthermore, it eliminates the need to derive asymptotic null distributions for the estimated CMI and avoids dataset splitting, making it particularly suitable for small datasets.


Differentiable Top-k with Optimal Transport

Neural Information Processing Systems

Finding the k largest or smallest elements from a collection of scores, i.e., top-k operation, is an important model component widely used in information retrieval, machine learning, and data mining. However, if the top-k operation is implemented in an algorithmic way, e.g., using bubble algorithm, the resulted model cannot be trained in an end-to-end way using prevalent gradient descent algorithms. This is because these implementations typically involve swapping indices, whose gradient cannot be computed. Moreover, the corresponding mapping from the input scores to the indicator vector of whether this element belongs to the top-k set is essentially discontinuous. To address the issue, we propose a smoothed approximation, namely SOFT (Scalable Optimal transport-based diFferenTiable) top-k operator.


Explaining k-Nearest Neighbors: Abductive and Counterfactual Explanations

arXiv.org Artificial Intelligence

Despite the wide use of $k$-Nearest Neighbors as classification models, their explainability properties remain poorly understood from a theoretical perspective. While nearest neighbors classifiers offer interpretability from a "data perspective", in which the classification of an input vector $\bar{x}$ is explained by identifying the vectors $\bar{v}_1, \ldots, \bar{v}_k$ in the training set that determine the classification of $\bar{x}$, we argue that such explanations can be impractical in high-dimensional applications, where each vector has hundreds or thousands of features and it is not clear what their relative importance is. Hence, we focus on understanding nearest neighbor classifications through a "feature perspective", in which the goal is to identify how the values of the features in $\bar{x}$ affect its classification. Concretely, we study abductive explanations such as "minimum sufficient reasons", which correspond to sets of features in $\bar{x}$ that are enough to guarantee its classification, and "counterfactual explanations" based on the minimum distance feature changes one would have to perform in $\bar{x}$ to change its classification. We present a detailed landscape of positive and negative complexity results for counterfactual and abductive explanations, distinguishing between discrete and continuous feature spaces, and considering the impact of the choice of distance function involved. Finally, we show that despite some negative complexity results, Integer Quadratic Programming and SAT solving allow for computing explanations in practice.


A Neighbor-based Approach to Pitch Ownership Models in Soccer

arXiv.org Artificial Intelligence

Pitch ownership models allow many types of analysis in soccer and provide valuable assistance to tactical analysts in understanding the game's dynamics. The novelty they provide over event-based analysis is that tracking data incorporates context that event-based data does not possess, like player positioning. This paper proposes a novel approach to building pitch ownership models in soccer games using the K-Nearest Neighbors (KNN) algorithm. Our approach provides a fast inference mechanism that can model different approaches to pitch control using the same algorithm. Despite its flexibility, it uses only three hyperparameters to tune the model, facilitating the tuning process for different player skill levels. The flexibility of the approach allows for the emulation of different methods available in the literature by adjusting a small number of parameters, including adjusting for different levels of uncertainty. In summary, the proposed model provides a new and more flexible strategy for building pitch ownership models, extending beyond just replicating existing algorithms, and can provide valuable insights for tactical analysts and open up new avenues for future research. We thoroughly visualize several examples demonstrating the presented models' strengths and weaknesses. The code is available at github.com/nvsclub/KNNPitchControl.


A novel Facial Recognition technique with Focusing on Masked Faces

arXiv.org Artificial Intelligence

Recognizing the same faces with and without masks is important for ensuring consistent identification in security, access control, and public safety. This capability is crucial in scenarios like law enforcement, healthcare, and surveillance, where accurate recognition must be maintained despite facial occlusion. This research focuses on the challenge of recognizing the same faces with and without masks by employing cosine similarity as the primary technique. With the increased use of masks, traditional facial recognition systems face significant accuracy issues, making it crucial to develop methods that can reliably identify individuals in masked conditions. For that reason, this study proposed Masked-Unmasked Face Matching Model (MUFM). This model employs transfer learning using the Visual Geometry Group (VGG16) model to extract significant facial features, which are subsequently classified utilizing the K-Nearest Neighbors (K-NN) algorithm. The cosine similarity metric is employed to compare masked and unmasked faces of the same individuals. This approach represents a novel contribution, as the task of recognizing the same individual with and without a mask using cosine similarity has not been previously addressed. By integrating these advanced methodologies, the research demonstrates effective identification of individuals despite the presence of masks, addressing a significant limitation in traditional systems. Using data is another essential part of this work, by collecting and preparing an image dataset from three different sources especially some of those data are real provided a comprehensive power of this research. The image dataset used were already collected in three different datasets of masked and unmasked for the same faces.