AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

A Bayes consistent 1-NN classifier

arXiv.org Machine LearningApr-1-2017

We show that a simple modification of the 1-nearest neighbor classifier yields a strongly Bayes consistent learner. Prior to this work, the only strongly Bayes consistent proximity-based method was the k-nearest neighbor classifier, for k growing appropriately with sample size. We will argue that a margin-regularized 1-NN enjoys considerable statistical and algorithmic advantages over the k-NN classifier. These include user-friendly finite-sample error bounds, as well as time- and memory-efficient learning and test-point evaluation algorithms with a principled speed-accuracy tradeoff. Encouraging empirical results are reported.

artificial intelligence, classifier, machine learning, (13 more...)

arXiv.org Machine Learning

1407.0208

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Doing Machine Learning (K Nearest Neighbors) in a Messenger Chatbot

#artificialintelligenceMar-16-2017, 13:15:15 GMT

Chatbots are taking over the world. Companies want them, developers are creating them, and more important, people are eager to try them. The use cases and experiences a chatbot can bring seem to be unlimited -- there are bots for managing group purchases, bots for helping people to obtain their immigration visas, and even bots for tracking flights. So, eventually the hype got me and I decided to build my first chatbot. My idea for this chatbot was to see how I could manually train and use a machine learning model through a chat-based interface, while trying several ideas and concepts that could enhance the experience of using a machine learning model in this way.

artificial intelligence, machine learning, natural language, (6 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.94)

Add feedback

Minimax-optimal semi-supervised regression on unknown manifolds

Moscovich, Amit, Jaffe, Ariel, Nadler, Boaz

arXiv.org Machine LearningMar-6-2017

We consider semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple two step approach to this problem is to: (i) estimate the manifold geodesic distance between any pair of points using both the labeled and unlabeled instances; and (ii) apply a k nearest neighbor regressor based on these distance estimates. We prove that given sufficiently many unlabeled points, this simple method of geodesic kNN regression achieves the optimal finite-sample minimax bound on the mean squared error, as if the manifold were known. Furthermore, we show how this approach can be efficiently implemented, requiring only O(k N log N) operations to estimate the regression function at all N labeled and unlabeled points. We illustrate this approach on two datasets with a manifold structure: indoor localization using WiFi fingerprints and facial pose estimation. In both cases, geodesic kNN is more accurate and much faster than the popular Laplacian eigenvector regressor.

data mining, machine learning, regression, (17 more...)

arXiv.org Machine Learning

1611.02221

Country: North America > United States (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

Optimization of distributions differences for classification

Bonyadi, Mohammad Reza, Tieng, Quang M., Reutens, David C.

arXiv.org Machine LearningMar-2-2017

In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability.

artificial intelligence, dataset, machine learning, (17 more...)

arXiv.org Machine Learning

1703.00989

Country: Oceania > Australia (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)
(2 more...)

Add feedback

How to choose the right algorithm for your machine learning problem

#artificialintelligenceFeb-26-2017, 14:45:28 GMT

With the recent machine learning boom, more and more algorithms have become available that perform exceptionally well on a number of tasks. But knowing beforehand which algorithm will perform best on your specific problem is often not possible. If you had infinite time at your disposal, you could just go through all of them and try them out. The following post shows you a better way to do this, step by step, by relying on known techniques from model selection and hyper-parameter tuning. Before we get in too deep, we want to make sure we brushed up on the basics. In specific, we should know that there are three main categories of machine learning: supervised learning, unsupervised learning, and reinforcement learning.

algorithm, artificial intelligence, machine learning, (9 more...)

#artificialintelligence

Industry: Education > Focused Education > Special Education (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.32)

Add feedback

RIPML: A Restricted Isometry Property based Approach to Multilabel Learning

Soni, Akshay, Mehdad, Yashar

arXiv.org Machine LearningFeb-16-2017

The multilabel learning problem with large number of labels, features, and data-points has generated a tremendous interest recently. A recurring theme of these problems is that only a few labels are active in any given datapoint as compared to the total number of labels. However, only a small number of existing work take direct advantage of this inherent extreme sparsity in the label space. By the virtue of Restricted Isometry Property (RIP), satisfied by many random ensembles, we propose a novel procedure for multilabel learning known as RIPML. During the training phase, in RIPML, labels are projected onto a random low-dimensional subspace followed by solving a least-square problem in this subspace. Inference is done by a k-nearest neighbor (kNN) based approach. We demonstrate the effectiveness of RIPML by conducting extensive simulations and comparing results with the state-of-the-art linear dimensionality reduction based approaches.

artificial intelligence, dataset, machine learning, (15 more...)

arXiv.org Machine Learning

1702.05181

Country: North America > United States (0.46)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback

Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning

Song, Kun (Northwestern Polytechnical University) | Nie, Feiping (Northwestern Polytechnical University) | Han, Junwei (Northwestern Polytechnical University) | Li, Xuelong (Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences)

AAAI ConferencesFeb-14-2017

We introduce a novel supervised metric learning algorithm named parameter free large margin nearest neighbor (PFLMNN) which can be seen as an improvement of the classical large margin nearest neighbor (LMNN) algorithm. The contributions of our work consist of two aspects. First, our method discards the costterm which shrinks the distances between inquiry input and its k target neighbors (the k nearest neighbors with same labels as inquiry input) in LMNN, and only focuses on improving the action to push the imposters (the samples with different labels form the inquiry input) apart out of the neighborhood of inquiry. As a result, our method does not have the parameter needed to tune on the validating set, which makes it more convenient to use. Second, by leveraging the geometry information of the imposters, we construct a novel cost function to penalize the smalldistances between each inquiry and its imposters. Different from LMNN considering every imposter located in the neighborhood of each inquiry, our method only takes care of the nearest imposters. Because when the nearest imposter is pushed out of the neighborhood of its inquiry, other imposters would be all out. In this way, the constraints in our model are much less than that of LMNN, which makes our method much easier to find the optimal distance metric. Consequently, our method not only learns a better distance metric than LMNN, but also runs faster than LMNN. Extensive experiments on different data sets with various sizes and difficulties are conducted, and the results have shown that, compared with LMNN, PFLMNN achieves better classification results.

artificial intelligence, imposter, machine learning, (17 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country: Asia > China (0.15)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.69)

Add feedback

Dynamic time warping distance for message propagation classification in Twitter

Jendoubi, Siwar, Martin, Arnaud, Liétard, Ludovic, Yaghlane, Boutheina Ben, Hadji, Hend Ben

arXiv.org Machine LearningJan-26-2017

Social messages classification is a research domain that has attracted the attention of many researchers in these last years. Indeed, the social message is different from ordinary text because it has some special characteristics like its shortness. Then the development of new approaches for the processing of the social message is now essential to make its classification more efficient. In this paper, we are mainly interested in the classification of social messages based on their spreading on online social networks (OSN). We proposed a new distance metric based on the Dynamic Time Warping distance and we use it with the probabilistic and the evidential k Nearest Neighbors (k-NN) classifiers to classify propagation networks (PrNets) of messages. The propagation network is a directed acyclic graph (DAG) that is used to record propagation traces of the message, the traversed links and their types. We tested the proposed metric with the chosen k-NN classifiers on real world propagation traces that were collected from Twitter social network and we got good classification accuracies.

artificial intelligence, machine learning, propagation network, (17 more...)

arXiv.org Machine Learning

doi: 10.1007/978-3-319-20807-7_38

1701.07756

Country: Europe > France (0.28)

Genre: Research Report (0.64)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.58)

Add feedback

k*-Nearest Neighbors: From Global to Local

Anava, Oren, Levy, Kfir Y.

arXiv.org Machine LearningJan-25-2017

The weighted k-nearest neighbors algorithm is one of the most fundamental non-parametric methods in pattern recognition and machine learning. The question of setting the optimal number of neighbors as well as the optimal weights has received much attention throughout the years, nevertheless this problem seems to have remained unsettled. In this paper we offer a simple approach to locally weighted regression/classification, where we make the bias-variance tradeoff explicit. Our formulation enables us to phrase a notion of optimal weights, and to efficiently find these weights as well as the optimal number of neighbors efficiently and adaptively, for each data point whose value we wish to estimate. The applicability of our approach is demonstrated on several datasets, showing superior performance over standard locally weighted methods.

algorithm, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1701.07266

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Why do Decision Trees Work?

#artificialintelligenceJan-6-2017, 00:50:48 GMT

In this article we will discuss the machine learning method called "decision trees", moving quickly over the usual "how decision trees work" and spending time on "why decision trees work." We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning theory more comprehensible. The goal of this article is to set up terminology so we can state in one or two sentences why decision trees tend to work well in practice. Newcomers to data science are often disappointed to learn that the job of the data scientist isn't tweaking and inventing new machine learning algorithms. In the "big data" world supervised learning has been a solved problem since at least 1951 (see [FixHodges1951] for neighborhood density methods, see [GordonOlshen1978] for k-nearest neighbor and decision tree methods).

artificial intelligence, decision tree learning, machine learning, (15 more...)

#artificialintelligence

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Texas (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Diagnosis (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.54)

Add feedback