On high-dimensional modifications of the nearest neighbor classifier

Ghosh, Annesha, Banerjee, Bilol, Ghosh, Anil K.

Jul-6-2024–arXiv.org Machine Learning

In supervised classification, we use a training set of labeled observations from different competing classes to form a decision rule for classifying unlabeled test set observations as accurately as possible. Starting from Fisher (1936), Rao (1948) and Fix and Hodges (1951), several parametric as well as nonparametric classifiers have been developed for this purpose (see, e.g., Duda et al., 2007; Hastie et al., 2009). Among them, the nearest neighbor classifier (see, e.g., Cover and Hart, 1967) is perhaps the most popular one. The k-nearest neighbor classifier (k-NN) classifies an observation x to the class having the maximum number of representatives among the k nearest neighbors of x. This classifier works well if the training sample size is large compared to the dimension of the data. For a suitable choice of k (which increases with the training sample size at an appropriate rate), under some mild regularity conditions, the misclassification rate of the k-NN classifier converges to the Bayes risk (i.e., the misclassification rate of the Bayes classifier) as the training sample size grows to infinity (see, e.g.

artificial intelligence, classifier, machine learning, (16 more...)

arXiv.org Machine Learning

Jul-6-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Industry:
- Health & Medicine (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (1.00)
    - Statistical Learning > Nearest Neighbor Methods (1.00)
  - Representation & Reasoning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found