AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

On Convergence of Nearest Neighbor Classifiers over Feature Transformations

Rimanic, Luka, Renggli, Cedric, Li, Bo, Zhang, Ce

arXiv.org Machine LearningOct-15-2020

The k-Nearest Neighbors (kNN) classifier is a fundamental non-parametric machine learning algorithm. However, it is well known that it suffers from the curse of dimensionality, which is why in practice one often applies a kNN classifier on top of a (pre-trained) feature transformation. From a theoretical perspective, most, if not all theoretical results aimed at understanding the kNN classifier are derived for the raw feature space. This leads to an emerging gap between our theoretical understanding of kNN and its practical applications. In this paper, we take a first step towards bridging this gap. We provide a novel analysis on the convergence rates of a kNN classifier over transformed features. This analysis requires in-depth understanding of the properties that connect both the transformed space and the raw feature space. More precisely, we build our convergence bound upon two key properties of the transformed space: (1) safety -- how well can one recover the raw posterior from the transformed space, and (2) smoothness -- how complex this recovery function is. Based on our result, we are able to explain why some (pre-trained) feature transformations are better suited for a kNN classifier than other. We empirically validate that both properties have an impact on the kNN convergence on 30 feature transformations with 6 benchmark datasets spanning from the vision to the text domain.

artificial intelligence, machine learning, transformation, (16 more...)

arXiv.org Machine Learning

2010.07765

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Monitoring Trust in Human-Machine Interactions for Public Sector Applications

Faruqe, Farhana, Watkins, Ryan, Medsker, Larry

arXiv.org Artificial IntelligenceOct-15-2020

The work reported here addresses the capacity of psychophysiological sensors and measures using Electroencephalogram (EEG) and Galvanic Skin Response (GSR) to detect levels of trust for humans using AI-supported Human-Machine Interaction (HMI). Improvements to the analysis of EEG and GSR data may create models that perform as well, or better than, traditional tools. A challenge to analyzing the EEG and GSR data is the large amount of training data required due to a large number of variables in the measurements. Researchers have routinely used standard machine-learning classifiers like artificial neural networks (ANN), support vector machines (SVM), and K-nearest neighbors (KNN). Traditionally, these have provided few insights into which features of the EEG and GSR data facilitate the more and least accurate predictions - thus making it harder to improve the HMI and human-machine trust relationship. A key ingredient to applying trust-sensor research results to practical situations and monitoring trust in work environments is the understanding of which key features are contributing to trust and then reducing the amount of data needed for practical applications. We used the Local Interpretable Model-agnostic Explanations (LIME) model as a process to reduce the volume of data required to monitor and enhance trust in HMI systems - a technology that could be valuable for governmental and public sector applications. Explainable AI can make HMI systems transparent and promote trust. From customer service in government agencies and community-level non-profit public service organizations to national military and cybersecurity institutions, many public sector organizations are increasingly concerned to have effective and ethical HMI with services that are trustworthy, unbiased, and free of unintended negative consequences.

health & medicine, neural network, prediction, (16 more...)

arXiv.org Artificial Intelligence

2010.0814

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)

Add feedback

On the Power of Abstention and Data-Driven Decision Making for Adversarial Robustness

Balcan, Maria-Florina, Blum, Avrim, Sharma, Dravyansh, Zhang, Hongyang

arXiv.org Machine LearningOct-12-2020

What these results have in common is that changes that either are imperceptible or should be irrelevant to the classification task can lead to drastically different network behavior. One reason for this vulnerability to adversarial attack is the non-Lipschitzness property of typical neural networks: small but adversarial movements in the input space can often produce large perturbations in the feature space. In this work, we consider the question of whether non-Lipschitz networks are intrinsically vulnerable, or if they could still be made robust to adversarial attack, in an abstract but (we believe) instructive adversarial model. In particular, suppose an adversary, by making an imperceptible change to an input x, can cause its representation F (x) in feature space (the penultimate layer of the network) to move by an arbitrary amount: will such an adversary always win? Clearly if the adversary can modify F (x) by an arbitrary amount in an arbitrary direction, then yes. But what if the adversary can modify F (x) by an arbitrary amount but only in a random direction (which it cannot control)? In this case, we show an interesting dichotomy: if the classifier must output a classification on any input it is given, then yes the adversary will still win, no matter how well-separated the classes are in feature space and no matter what decision surface the classifier uses.

deep learning, neural network, probability, (20 more...)

arXiv.org Machine Learning

2010.06154

Country: North America > United States (0.46)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.54)
Government > Military (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Similarity Based Stratified Splitting: an approach to train better classifiers

Farias, Felipe, Ludermir, Teresa, Bastos-Filho, Carmelo

arXiv.org Machine LearningOct-12-2020

We propose a Similarity-Based Stratified Splitting (SBSS) technique, which uses both the output and input space information to split the data. The splits are generated using similarity functions among samples to place similar samples in different splits. This approach allows for a better representation of the data in the training phase. This strategy leads to a more realistic performance estimation when used in real-world applications. We evaluate our proposal in twenty-two benchmark datasets with classifiers such as Multi-Layer Perceptron, Support Vector Machine, Random Forest and K-Nearest Neighbors, and five similarity functions Cityblock, Chebyshev, Cosine, Correlation, and Euclidean. According to the Wilcoxon Sign-Rank test, our approach consistently outperformed ordinary stratified 10-fold cross-validation in 75\% of the assessed scenarios.

accuracy, artificial intelligence, neural network, (16 more...)

arXiv.org Machine Learning

2010.06099

Country:

South America > Brazil > Pernambuco (0.15)
Europe > United Kingdom > England (0.14)

Genre: Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Analysis of KNN Density Estimation

Zhao, Puning, Lai, Lifeng

arXiv.org Machine LearningSep-29-2020

We analyze the $\ell_1$ and $\ell_\infty$ convergence rates of k nearest neighbor density estimation method. Our analysis includes two different cases depending on whether the support set is bounded or not. In the first case, the probability density function has a bounded support and is bounded away from zero. We show that kNN density estimation is minimax optimal under both $\ell_1$ and $\ell_\infty$ criteria, if the support set is known. If the support set is unknown, then the convergence rate of $\ell_1$ error is not affected, while $\ell_\infty$ error does not converge. In the second case, the probability density function can approach zero and is smooth everywhere. Moreover, the Hessian is assumed to decay with the density values. For this case, our result shows that the $\ell_\infty$ error of kNN density estimation is nearly minimax optimal. The $\ell_1$ error does not reach the minimax lower bound, but is better than kernel density estimation.

artificial intelligence, estimator, machine learning, (19 more...)

arXiv.org Machine Learning

2010.00438

Country: North America > United States > California > Yolo County > Davis (0.14)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Visualizing classification results

Raymaekers, Jakob, Rousseeuw, Peter J., Hubert, Mia

arXiv.org Machine LearningSep-27-2020

Classification is a major tool of statistics and machine learning. A classification method first processes a training set of objects with given classes (labels), with the goal of afterward assigning new objects to one of these classes. When running the resulting prediction method on the training data or on test data, it can happen that an object is predicted to lie in a class that differs from its given label. This is sometimes called label bias, and raises the question whether the object was mislabeled. Our goal is to visualize aspects of the data classification to obtain insight. The proposed display reflects to what extent each object's label is (dis)similar to its prediction, how far each object lies from the other objects in its class, and whether some objects lie far from all classes. The display is constructed for discriminant analysis, the k-nearest neighbor classifier, support vector machines, logistic regression, and majority voting. It is illustrated on several benchmark datasets containing images and texts.

artificial intelligence, farness, health & medicine, (19 more...)

arXiv.org Machine Learning

2007.14495

Country: North America > United States > New Jersey (0.14)

Genre:

Research Report > New Finding (0.48)
Research Report > Experimental Study (0.34)

Industry: Health & Medicine > Consumer Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.55)

Add feedback

Introduction to Machine Learning in R

#artificialintelligenceSep-22-2020, 10:52:25 GMT

This course material is aimed at people who are already familiar with ... What you'll learn This course is about the fundamental concepts of machine learning, facusing on neural networks. This topic is getting very hot nowadays because these learning algorithms can be used in several fields from software engineering to investment banking. Learning algorithms can recognize patterns which can help detect cancer for example. We may construct algorithms that can have a very good guess about stock prices movement in the market.

artificial intelligence, banking & finance, machine learning, (2 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.45)

Industry: Banking & Finance > Trading (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.45)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Conditional Image Retrieval

Hamilton, Mark, Fu, Stephanie, Lu, Mindren, Freeman, William T.

arXiv.org Machine LearningSep-18-2020

This work introduces Conditional Image Retrieval (CIR) systems: IR methods that can efficiently specialize to specific subsets of images on the fly. These systems broaden the class of queries IR systems support, and eliminate the need for expensive re-fitting to specific subsets of data. Specifically, we adapt tree-based K-Nearest Neighbor (KNN) data-structures to the conditional setting by introducing additional inverted-index data-structures. This speeds conditional queries and does not slow queries without conditioning. We present two new datasets for evaluating the performance of CIR systems and evaluate a variety of design choices. As a motivating application, we present an algorithm that can explore shared semantic content between works of art of vastly different media and cultural origin. Finally, we demonstrate that CIR data-structures can identify Generative Adversarial Network (GAN) "blind spots": areas where GANs fail to properly model the true data distribution.

dataset, deep learning, neural network, (22 more...)

arXiv.org Machine Learning

2007.07177

Country: North America > United States (0.29)

Genre: Research Report (0.83)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
(2 more...)

Add feedback

'Less Than One'-Shot Learning: Learning N Classes From M

Sucholutsky, Ilia, Schonlau, Matthias

arXiv.org Machine LearningSep-17-2020

Deep neural networks require large training sets but suffer from high computational cost and long training times. Training on much smaller training sets while maintaining nearly the same accuracy would be very beneficial. In the few-shot learning setting, a model must learn a new class given only a small number of samples from that class. One-shot learning is an extreme form of few-shot learning where the model must learn a new class from a single example. We propose the `less than one'-shot learning task where models must learn $N$ new classes given only $M

artificial intelligence, machine learning, prototype, (15 more...)

arXiv.org Machine Learning

2009.08449

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Universal consistency of Wasserstein $k$-NN classifier

Ponnoprat, Donlapark

arXiv.org Machine LearningSep-13-2020

The Wasserstein distance provides a notion of dissimilarities between probability measures, which has recent applications in learning of structured data with varying size such as images and text documents. In this work, we analyze the $k$-nearest neighbor classifier ($k$-NN) under the Wasserstein distance and establish the universal consistency on families of distributions. Using previous known results on the consistency of the $k$-NN classifier on infinite dimensional metric spaces, it suffices to show that the families is a countable union of finite dimensional components. As a result, we are able to prove universal consistency of $k$-NN on spaces of finitely supported measures, the space of finite wavelet series and the spaces of Gaussian measures with commuting covariance matrices.

artificial intelligence, consistency, natural language, (15 more...)

arXiv.org Machine Learning

2009.04651

Country:

Europe (0.28)
North America > United States > California (0.14)
Asia > Middle East > Qatar (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.48)

Add feedback