On high-dimensional modifications of the nearest neighbor classifier

Ghosh, Annesha, Banerjee, Bilol, Ghosh, Anil K.

arXiv.org Machine Learning 

In supervised classification, we use a training set of labeled observations from different competing classes to form a decision rule for classifying unlabeled test set observations as accurately as possible. Starting from Fisher (1936), Rao (1948) and Fix and Hodges (1951), several parametric as well as nonparametric classifiers have been developed for this purpose (see, e.g., Duda et al., 2007; Hastie et al., 2009). Among them, the nearest neighbor classifier (see, e.g., Cover and Hart, 1967) is perhaps the most popular one. The k-nearest neighbor classifier (k-NN) classifies an observation x to the class having the maximum number of representatives among the k nearest neighbors of x. This classifier works well if the training sample size is large compared to the dimension of the data. For a suitable choice of k (which increases with the training sample size at an appropriate rate), under some mild regularity conditions, the misclassification rate of the k-NN classifier converges to the Bayes risk (i.e., the misclassification rate of the Bayes classifier) as the training sample size grows to infinity (see, e.g.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found