AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

An improvement to k-nearest neighbor classifier

Sarma, T. Hitendra, Viswanath, P., Reddy, D. Sai Koti, Raghava, S. Sri

arXiv.org Machine LearningJan-27-2013

K-Nearest neighbor classifier (k-NNC) is simple to use and has little design time like finding k values in k-nearest neighbor classifier, hence these are suitable to work with dynamically varying data-sets. There exists some fundamental improvements over the basic k-NNC, like weighted k-nearest neighbors classifier (where weights to nearest neighbors are given based on linear interpolation), using artificially generated training set called bootstrapped training set, etc. These improvements are orthogonal to space reduction and classification time reduction techniques, hence can be coupled with any of them. The paper proposes another improvement to the basic k-NNC where the weights to nearest neighbors are given based on Gaussian distribution (instead of linear interpolation as done in weighted k-NNC) which is also independent of any space reduction and classification time reduction technique. We formally show that our proposed method is closely related to non-parametric density estimation using a Gaussian kernel. We experimentally demonstrate using various standard data-sets that the proposed method is better than the existing ones in most cases.

artificial intelligence, machine learning, neighbor, (17 more...)

arXiv.org Machine Learning

1301.6324

Country: North America > United States > California > Orange County > Irvine (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Combining Feature and Prototype Pruning by Uncertainty Minimization

Sebban, Marc, Nock, Richard

arXiv.org Machine LearningJan-16-2013

We focus in this paper on dataset reduction techniques for use in k-nearest neighbor classification. In such a context, feature and prototype selections have always been independently treated by the standard storage reduction algorithms. While this certifying is theoretically justified by the fact that each subproblem is NP-hard, we assume in this paper that a joint storage reduction is in fact more intuitive and can in practice provide better results than two independent processes. Moreover, it avoids a lot of distance calculations by progressively removing useless instances during the feature pruning. While standard selection algorithms often optimize the accuracy to discriminate the set of solutions, we use in this paper a criterion based on an uncertainty measure within a nearest-neighbor graph. This choice comes from recent results that have proven that accuracy is not always the suitable criterion to optimize. In our approach, a feature or an instance is removed if its deletion improves information of the graph. Numerous experiments are presented in this paper and a statistical analysis shows the relevance of our approach, and its tolerance in the presence of noise.

accuracy, algorithm, reduction, (16 more...)

arXiv.org Machine Learning

1301.3891

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback

Probabilistic Models for Unified Collaborative and Content-Based Recommendation in Sparse-Data Environments

Popescul, Alexandrin, Ungar, Lyle H., Pennock, David M, Lawrence, Steve

arXiv.org Machine LearningJan-10-2013

Recommender systems leverage product and community information to target products to consumers. Researchers have developed collaborative recommenders, content-based recommenders, and (largely ad-hoc) hybrid systems. We propose a unified probabilistic framework for merging collaborative and content-based recommendations. We extend Hofmann's [1999] aspect model to incorporate three-way co-occurrence data among users, items, and item content. The relative influence of collaboration data versus content data is not imposed as an exogenous parameter, but rather emerges naturally from the given data sources. Global probabilistic models coupled with standard Expectation Maximization (EM) learning algorithms tend to drastically overfit in sparse-data situations, as is typical in recommendation applications. We show that secondary content information can often be used to overcome sparsity. Experiments on data from the ResearchIndex library of Computer Science publications show that appropriate mixture models incorporating secondary data produce significantly better quality recommenders than k-nearest neighbors (k-NN). Global probabilistic models also allow more general inferences than local methods like k-NN.

artificial intelligence, information, machine learning, (15 more...)

arXiv.org Machine Learning

1301.2303

Country: North America > United States > Pennsylvania (0.28)

Genre: Research Report (0.82)

Industry:

Media (0.47)
Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Add feedback

Distance Metric Learning for Kernel Machines

Xu, Zhixiang, Weinberger, Kilian Q., Chapelle, Olivier

arXiv.org Machine LearningJan-8-2013

Recent work in metric learning has significantly improved the state-of-the-art in k-nearest neighbor classification. Support vector machines (SVM), particularly with RBF kernels, are amongst the most popular classification algorithms that uses distance metrics to compare examples. This paper provides an empirical analysis of the efficacy of three of the most popular Mahalanobis metric learning algorithms as pre-processing for SVM training. We show that none of these algorithms generate metrics that lead to particularly satisfying improvements for SVM-RBF classification. As a remedy we introduce support vector metric learning (SVML), a novel algorithm that seamlessly combines the learning of a Mahalanobis metric with the training of the RBF-SVM parameters. We demonstrate the capabilities of SVML on nine benchmark data sets of varying sizes and difficulties. In our study, SVML outperforms all alternative state-of-the-art metric learning algorithms in terms of accuracy and establishes itself as a serious alternative to the standard Euclidean metric with model selection by cross validation.

algorithm, artificial intelligence, machine learning, (12 more...)

arXiv.org Machine Learning

1208.3422

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.46)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.70)

Add feedback

Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Noh, Yung-kyun, Park, Frank, Lee, Daniel D.

Neural Information Processing SystemsDec-31-2012

We show that conventional k-nearest neighbor classification can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in k-nearest neighbor classification. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose five different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classification criteria.

artificial intelligence, machine learning, nearest neighbor, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Comparing K-Nearest Neighbors and Potential Energy Method in classification problem. A case study using KNN applet by E.M. Mirkes and real life benchmark data sets

Shi, Yanshan

arXiv.org Machine LearningNov-5-2012

Abstract: K-nearest neighbors (KNN) method is used in many supervised learning classification problems. Potential Energy (PE) method is also developed for classification problems based on its physical metaphor. The energy potential used in the experiments are Yukawa potential and Gaussian Potential. In this paper, I use both applet and MATLAB program with real life benchmark data to analyze the performances of KNN and PE method in classification problems. The results show that in general, KNN and PE methods have similar performance. In particular, PE with Yukawa potential has worse performance than KNN when the density of the data is higher in the distribution of the database. When the Gaussian potential is applied, the results from PE and KNN have similar behavior. The indicators used are correlation coefficients and information gain. Keywords: K-nearest neighbor, potential energy method, Yukawa potential, Gaussian potential, correlation coefficients, information gain 1. Introduction The target of supervised learning is to learn a mapping from the input to an output whose correct values are provided. However for unsupervised learning, no correct values are provided hence the only known object is the input data and the target is to find the regularities in the input. Classification is considered as an object of supervised learning.

artificial intelligence, database, machine learning, (17 more...)

arXiv.org Machine Learning

1211.0879

Country:

North America > United States (0.46)
Europe > United Kingdom > England (0.14)

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Universally Consistent Latent Position Estimation and Vertex Classification for Random Dot Product Graphs

Sussman, Daniel L., Tang, Minh, Priebe, Carey E.

arXiv.org Machine LearningJul-29-2012

In this work we show that, using the eigen-decomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, then we show that the remaining vertices can be classified with error converging to Bayes optimal using the $k$-nearest-neighbors classification rule. We evaluate the proposed methods on simulated data and a graph derived from Wikipedia.

artificial intelligence, machine learning, vertex, (17 more...)

arXiv.org Machine Learning

1207.6745

Genre: Research Report (0.64)

Technology:

Information Technology > Communications > Social Media (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.50)

Add feedback

Cost-Sensitive Risk Stratification in the Diagnosis of Heart Disease

Uguroglu, Selen (Carnegie Mellon University) | Doyle, Mark (Allegheny General Hospital) | Biederman, Robert (Allegheny General Hospital) | Carbonell, Jaime (Carnegie Mellon University)

AAAI ConferencesJul-21-2012

We investigate machine learning methods for diagnostic screening of heart disease. Coronary heart disease is the leading cause of death in the US, causing more deaths than all types of cancers combined. Early diagnosis of heart disease in women is harder than it is in men and typically requires the administration of several clinical tests on the patient. Most risk stratification methods aggregate the results of such tests, including the risky, invasive procedures that cannot be administered on all patients. In this paper, our goal is to identify patients who are under high-risk of having heart disease and related adverse events, using a minimal number of diagnostic tests, especially less invasive ones. The low frequency of patients with severe heart disease in the dataset is challenging for most conventional machine learning methods. To overcome this problem, we develop and apply a cost-sensitive k nearest neighbor algorithm. Our contributions are two fold: First, we compare the predictive value of several diagnostic procedures for heart disease, including electrocardiography, angiography, radionuclide perfusion and conclude that in womens heart disease, certain combinations of non-invasive techniques are more predictive than some of the widely used invasive procedures. Then, we evaluate held out data and achieve an AUROC over 0.70, signifying valuable clinical utility, using only the least costly and least invasive tests.

diagnostic test, heart disease, procedure, (16 more...)

AAAI Conferences

Twenty-Fourth IAAI Conference

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)

Add feedback

Shortest path distance in random k-nearest neighbor graphs

Alamgir, Morteza, von Luxburg, Ulrike

arXiv.org Machine LearningJul-9-2012

Consider a weighted or unweighted k-nearest neighbor graph that has been built on n data points drawn randomly according to some density p on R^d. We study the convergence of the shortest path distance in such graphs as the sample size tends to infinity. We prove that for unweighted kNN graphs, this distance converges to an unpleasant distance function on the underlying space whose properties are detrimental to machine learning. We also study the behavior of the shortest path distance in weighted kNN graphs.

artificial intelligence, graph, machine learning, (15 more...)

arXiv.org Machine Learning

1206.6381

Country: Europe > Germany (0.46)

Genre: Research Report (0.83)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Leveraging Usage Data for Linked Data Movie Entity Summarization

Thalhammer, Andreas, Toma, Ioan, Roa-Valverde, Antonio, Fensel, Dieter

arXiv.org Artificial IntelligenceApr-12-2012

Novel research in the field of Linked Data focuses on the problem of entity summarization. This field addresses the problem of ranking features according to their importance for the task of identifying a particular entity. Next to a more human friendly presentation, these summarizations can play a central role for semantic search engines and semantic recommender systems. In current approaches, it has been tried to apply entity summarization based on patterns that are inherent to the regarded data. The proposed approach of this paper focuses on the movie domain. It utilizes usage data in order to support measuring the similarity between movie entities. Using this similarity it is possible to determine the k-nearest neighbors of an entity. This leads to the idea that features that entities share with their nearest neighbors can be considered as significant or important for these entities. Additionally, we introduce a downgrading factor (similar to TF-IDF) in order to overcome the high number of commonly occurring features. We exemplify the approach based on a movie-ratings dataset that has been linked to Freebase entities.

artificial intelligence, machine learning, natural language, (22 more...)

arXiv.org Artificial Intelligence

1204.2718

Country:

Europe (1.00)
North America > United States (0.69)

Genre: Research Report > New Finding (0.46)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.71)
(2 more...)

Add feedback