Nearest Neighbor Methods
An Incremental Nearest Neighbor Algorithm with Queries
We consider the general problem of learning multi-category classifi(cid:173) cation from labeled examples. We present experimental results for a nearest neighbor algorithm which actively selects samples from different pattern classes according to a querying rule instead of the a priori class probabilities. The amount of improvement of this query-based approach over the passive batch approach depends on the complexity of the Bayes rule. The principle on which this al(cid:173) gorithm is based is general enough to be used in any learning algo(cid:173) rithm which permits a model-selection criterion and for which the error rate of the classifier is calculable in terms of the complexity of the model.
K-Local Hyperplane and Convex Distance Nearest Neighbor Algorithms
Guided by an initial idea of building a complex (non linear) decision surface with maximal local margin in input space, we give a possible geometrical intuition as to why K-Nearest Neighbor (KNN) algorithms often perform more poorly than SVMs on classification tasks. We then propose modified K-Nearest Neighbor algorithms to overcome the per- ceived problem. The approach is similar in spirit to Tangent Distance, but with invariances inferred from the local neighborhood rather than prior knowledge. Experimental results on real world classification tasks sug- gest that the modified KNN algorithms often give a dramatic improve- ment over standard KNN and perform as well or better than SVMs.
New Algorithms for Efficient High Dimensional Non-parametric Classification
This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational lee- way, and we investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the pre- diction phase of SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.
An Investigation of Practical Approximate Nearest Neighbor Algorithms
This paper concerns approximate nearest neighbor searching algorithms, which have become increasingly important, especially in high dimen- sional perception areas such as computer vision, with dozens of publica- tions in recent years. Much of this enthusiasm is due to a successful new approximate nearest neighbor approach called Locality Sensitive Hash- ing (LSH). In this paper we ask the question: can earlier spatial data structure approaches to exact nearest neighbor, such as metric trees, be altered to provide approximate answers to proximity queries and if so, how? We introduce a new kind of metric tree that allows overlap: certain datapoints may appear in both the children of a parent. We show why these structures should be able to exploit the same random- projection-based approximations that LSH enjoys, but with a simpler al- gorithm and perhaps with greater efficiency.
Distance Metric Learning for Large Margin Nearest Neighbor Classification
We show how to learn a Mahanalobis distance metric for k -nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k -nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification--for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. As in support vector machines (SVMs), the learning problem reduces to a convex optimization based on the hinge loss. Unlike learning in SVMs, however, our framework requires no modification or extension for problems in multiway (as opposed to binary) classification.
Nearest Neighbor Based Feature Selection for Regression and its Application to Neural Activity
We present a non-linear, simple, yet effective, feature subset selection method for regression and use it in analyzing cortical neural activity. Our algorithm involves a feature-weighted version of the k-nearest-neighbor algorithm. It is able to capture complex dependency of the target func- tion on its input and makes use of the leave-one-out error as a natural regularization. We explain the characteristics of our algorithm on syn- thetic problems and use it in the context of predicting hand velocity from spikes recorded in motor cortex of a behaving monkey. By applying fea- ture selection we are able to improve prediction quality and suggest a novel way of exploring neural data.
Large Margin Component Analysis
Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, dis- tance learning algorithms cannot be used due to overfitting and high computa- tional complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a met- ric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimen- sionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin.
Geometric entropy minimization (GEM) for anomaly detection and localization
We introduce a novel adaptive non-parametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N their span recovers the entropy minimizing set that supports at least K/N (100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance 1 - . A method for implementing this non-parametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values.
Estimation of Information Theoretic Measures for Continuous Random Variables
We analyze the estimation of information theoretic measures of continuous random variables such as: differential entropy, mutual information or Kullback-Leibler divergence. The objective of this paper is two-fold. First, we prove that the information theoretic measure estimates using the k-nearest-neighbor density estimation with fixed k converge almost surely, even though the k-nearest-neighbor density estimation with fixed k does not converge to its true measure. Second, we show that the information theoretic measure estimates do not converge for k growing linearly with the number of samples. Nevertheless, these nonconvergent estimates can be used for solving the two-sample problem and assessing if two random variables are independent.
Learning a Distance Metric from a Network
Many real-world networks are described by both connectivity information and features for every node. To better model and understand these networks, we present structure preserving metric learning (SPML), an algorithm for learning a Mahalanobis distance metric from a network such that the learned distances are tied to the inherent connectivity structure of the network. Like the graph embedding algorithm structure preserving embedding, SPML learns a metric which is structure preserving, meaning a connectivity algorithm such as k-nearest neighbors will yield the correct connectivity when applied using the distances from the learned metric. We show a variety of synthetic and real-world experiments where SPML predicts link patterns from node features more accurately than standard techniques. We further demonstrate a method for optimizing SPML based on stochastic gradient descent which removes the running-time dependency on the size of the network and allows the method to easily scale to networks of thousands of nodes and millions of edges.