Goto

Collaborating Authors

 Nearest Neighbor Methods


Choice of neighbor order in nearest-neighbor classification

arXiv.org Machine Learning

The $k$th-nearest neighbor rule is arguably the simplest and most intuitively appealing nonparametric classification procedure. However, application of this method is inhibited by lack of knowledge about its properties, in particular, about the manner in which it is influenced by the value of $k$; and by the absence of techniques for empirical choice of $k$. In the present paper we detail the way in which the value of $k$ determines the misclassification error. We consider two models, Poisson and Binomial, for the training samples. Under the first model, data are recorded in a Poisson stream and are "assigned" to one or other of the two populations in accordance with the prior probabilities. In particular, the total number of data in both training samples is a Poisson-distributed random variable. Under the Binomial model, however, the total number of data in the training samples is fixed, although again each data value is assigned in a random way. Although the values of risk and regret associated with the Poisson and Binomial models are different, they are asymptotically equivalent to first order, and also to the risks associated with kernel-based classifiers that are tailored to the case of two derivatives. These properties motivate new methods for choosing the value of $k$.


Supervised functional classification: A theoretical remark and some comparisons

arXiv.org Machine Learning

The problem of supervised classification (or discrimination) with functional data is considered, with a special interest on the popular k-nearest neighbors (k-NN) classifier. First, relying on a recent result by Cerou and Guyader (2006), we prove the consistency of the k-NN classifier for functional data whose distribution belongs to a broad family of Gaussian processes with triangular covariance functions. Second, on a more practical side, we check the behavior of the k-NN method when compared with a few other functional classifiers. This is carried out through a small simulation study and the analysis of several real functional data sets. While no global "uniform" winner emerges from such comparisons, the overall performance of the k-NN method, together with its sound intuitive motivation and relative simplicity, suggests that it could represent a reasonable benchmark for the classification problem with functional data.


On the underestimation of model uncertainty by Bayesian K-nearest neighbors

arXiv.org Machine Learning

When using the K-nearest neighbors method, one often ignores uncertainty in the choice of K. To account for such uncertainty, Holmes and Adams (2002) proposed a Bayesian framework for K-nearest neighbors (KNN). Their Bayesian KNN (BKNN) approach uses a pseudo-likelihood function, and standard Markov chain Monte Carlo (MCMC) techniques to draw posterior samples. Holmes and Adams (2002) focused on the performance of BKNN in terms of misclassification error but did not assess its ability to quantify uncertainty. We present some evidence to show that BKNN still significantly underestimates model uncertainty.


Classification Constrained Dimensionality Reduction

arXiv.org Machine Learning

Dimensionality reduction is a topic of recent interest. In this paper, we present the classification constrained dimensionality reduction (CCDR) algorithm to account for label information. The algorithm can account for multiple classes as well as the semi-supervised setting. We present an out-of-sample expressions for both labeled and unlabeled data. For unlabeled data, we introduce a method of embedding a new point as preprocessing to a classifier. For labeled data, we introduce a method that improves the embedding during the training phase using the out-of-sample extension. We investigate classification performance using the CCDR algorithm on hyper-spectral satellite imagery data. We demonstrate the performance gain for both local and global classifiers and demonstrate a 10% improvement of the $k$-nearest neighbors algorithm performance. We present a connection between intrinsic dimension estimation and the optimal embedding dimension obtained using the CCDR algorithm.


Large Margin Component Analysis

Neural Information Processing Systems

Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines.


Geometric entropy minimization (GEM) for anomaly detection and localization

Neural Information Processing Systems

We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.


Large Margin Component Analysis

Neural Information Processing Systems

Metric learning has been shown to significantly improve the accuracy of k-nearest neighbor (kNN) classification. In problems involving thousands of features, distance learning algorithms cannot be used due to overfitting and high computational complexity. In such cases, previous work has relied on a two-step solution: first apply dimensionality reduction methods to the data, and then learn a metric in the resulting low-dimensional subspace. In this paper we show that better classification performance can be achieved by unifying the objectives of dimensionality reduction and metric learning. We propose a method that solves for the low-dimensional projection of the inputs, which minimizes a metric objective aimed at separating points in different classes by a large margin. This projection is defined by a significantly smaller number of parameters than metrics learned in input space, and thus our optimization reduces the risks of overfitting. Theory and results are presented for both a linear as well as a kernelized version of the algorithm. Overall, we achieve classification rates similar, and in several cases superior, to those of support vector machines.


Geometric entropy minimization (GEM) for anomaly detection and localization

Neural Information Processing Systems

We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.


Geometric entropy minimization (GEM) for anomaly detection and localization

Neural Information Processing Systems

We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Suchgraphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.


The Loss Rank Principle for Model Selection

arXiv.org Machine Learning

We introduce a new principle for model selection in regression and classification. Many regression models are controlled by some smoothness or flexibility or complexity parameter c, e.g. the number of neighbors to be averaged over in k nearest neighbor (kNN) regression or the polynomial degree in regression with polynomials. Let f_D^c be the (best) regressor of complexity c on data D. A more flexible regressor can fit more data D' well than a more rigid one. If something (here small loss) is easy to achieve it's typically worth less. We define the loss rank of f_D^c as the number of other (fictitious) data D' that are fitted better by f_D'^c than D is fitted by f_D^c. We suggest selecting the model complexity c that has minimal loss rank (LoRP). Unlike most penalized maximum likelihood variants (AIC,BIC,MDL), LoRP only depends on the regression function and loss function. It works without a stochastic noise model, and is directly applicable to any non-parametric regressor, like kNN. In this paper we formalize, discuss, and motivate LoRP, study it for specific regression problems, in particular linear ones, and compare it to other model selection schemes.