AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

K - Nearest Neighbors - KNN Fun and Easy Machine Learning

#artificialintelligenceAug-7-2017, 20:50:21 GMT

In pattern recognition, the KNN algorithm is a method for classifying objects based on closest training examples in the feature space. KNN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is delayed until classification. The KNN is the fundamental and simplest classification technique when there is little or no prior knowledge about the distribution of the data. The K in KNN refers to number of nearest neighbors that the classifier will use to make its predication. In this video we use Game of Thrones example to explain kNN.

artificial intelligence, fun and easy machine learning, inductive learning, (2 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.85)
Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (0.67)

Add feedback

KNN Ensembles for Tweedie Regression: The Power of Multiscale Neighborhoods

Farrelly, Colleen M.

arXiv.org Machine LearningJul-29-2017

Very few K-nearest-neighbor (KNN) ensembles exist, despite the efficacy of this approach in regression, classification, and outlier detection. Those that do exist focus on bagging features, rather than varying k or bagging observations; it is unknown whether varying k or bagging observations can improve prediction. Given recent studies from topological data analysis, varying k may function like multiscale topological methods, providing stability and better prediction, as well as increased ensemble diversity. This paper explores 7 KNN ensemble algorithms combining bagged features, bagged observations, and varied k to understand how each of these contribute to model fit. Specifically, these algorithms are tested on Tweedie regression problems through simulations and 6 real datasets; results are compared to state-of-the-art machine learning models including extreme learning machines, random forest, boosted regression, and Morse-Smale regression. Results on simulations suggest gains from varying k above and beyond bagging features or samples, as well as the robustness of KNN ensembles to the curse of dimensionality. KNN regression ensembles perform favorably against state-of-the-art algorithms and dramatically improve performance over KNN regression. Further, real dataset results suggest varying k is a good strategy in general (particularly for difficult Tweedie regression problems) and that KNN regression ensembles often outperform state-of-the-art methods. These results for k-varying ensembles echo recent theoretical results in topological data analysis, where multidimensional filter functions and multiscale coverings provide stability and performance gains over single-dimensional filters and single-scale covering. This opens up the possibility of leveraging multiscale neighborhoods and multiple measures of local geometry in ensemble methods.

artificial intelligence, ensemble, machine learning, (18 more...)

arXiv.org Machine Learning

1708.02122

Genre:

Research Report > New Finding (0.66)
Research Report > Promising Solution (0.48)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Consumer Health (0.68)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.49)

Add feedback

Fast k-Nearest Neighbour Search via Prioritized DCI

Li, Ke, Malik, Jitendra

arXiv.org Artificial IntelligenceJul-20-2017

Most exact methods for k-nearest neighbour search suffer from the curse of dimensionality; that is, their query times exhibit exponential dependence on either the ambient or the intrinsic dimensionality. Dynamic Continuous Indexing (DCI) (Li & Malik, 2016) offers a promising way of circumventing the curse and successfully reduces the dependence of query time on intrinsic dimensionality from exponential to sublinear. In this paper, we propose a variant of DCI, which we call Prioritized DCI, and show a remarkable improvement in the dependence of query time on intrinsic dimensionality. In particular, a linear increase in intrinsic dimensionality, or equivalently, an exponential increase in the number of points near a query, can be mostly counteracted with just a linear increase in space. We also demonstrate empirically that Prioritized DCI significantly outperforms prior methods. In particular, relative to Locality-Sensitive Hashing (LSH), Prioritized DCI reduces the number of distance evaluations by a factor of 14 to 116 and the memory consumption by a factor of 21.

artificial intelligence, dimensionality, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1703.0044

Country:

North America > United States > California > Alameda County > Berkeley (0.14)
North America > Canada > Ontario > Toronto (0.14)
Oceania > Australia > New South Wales > Sydney (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.68)

Add feedback

Exemplar-Centered Supervised Shallow Parametric Data Embedding

Min, Martin Renqiang, Guo, Hongyu, Song, Dongjin

arXiv.org Machine LearningJul-5-2017

Metric learning methods for dimensionality reduction in combination with k-Nearest Neighbors (kNN) have been extensively deployed in many classification, data embedding, and information retrieval applications. However, most of these approaches involve pairwise training data comparisons, and thus have quadratic computational complexity with respect to the size of training set, preventing them from scaling to fairly big datasets. Moreover, during testing, comparing test data against all the training data points is also expensive in terms of both computational cost and resources required. Furthermore, previous metrics are either too constrained or too expressive to be well learned. To effectively solve these issues, we present an exemplar-centered supervised shallow parametric data embedding model, using a Maximally Collapsing Metric Learning (MCML) objective. Our strategy learns a shallow high-order parametric embedding function and compares training/test data only with learned or precomputed exemplars, resulting in a cost function with linear computational complexity for both training and testing. We also empirically demonstrate, using several benchmark datasets, that for classification in two-dimensional embedding space, our approach not only gains speedup of kNN by hundreds of times, but also outperforms state-of-the-art supervised embedding approaches.

artificial intelligence, exemplar, machine learning, (17 more...)

arXiv.org Machine Learning

1702.06602

Country:

North America > United States (0.69)
North America > Canada > Ontario (0.46)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.68)

Add feedback

k-nearest neighbor algorithm using Python

@machinelearnbotJun-21-2017, 19:45:26 GMT

This article was written by Natasha Latysheva. Here we publish a short version, with references to full source code in the original article. In machine learning, you may often wish to build predictors that allows to classify things into categories based on some set of associated values. For example, it is possible to provide a diagnosis to a patient based on data from previous patients. Many algorithms have been developed for automated classification, and common ones include random forests, support vector machines, Naïve Bayes classifiers, and many types of neural networks.

artificial intelligence, k-nearest neighbor algorithm, machine learning, (8 more...)

@machinelearnbot

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.56)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.94)

Add feedback

Prototypical Networks for Few-shot Learning

Snell, Jake, Swersky, Kevin, Zemel, Richard S.

arXiv.org Machine LearningJun-19-2017

We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-the-art results on the CU-Birds dataset.

artificial intelligence, machine learning, prototypical network, (18 more...)

arXiv.org Machine Learning

1703.05175

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.46)

Add feedback

Introduction to Anomaly Detection

#artificialintelligenceMay-27-2017, 21:46:26 GMT

The simplest approach to identifying irregularities in data is to flag the data points that deviate from common statistical properties of a distribution, including mean, median, mode, and quantiles. Let's say the definition of an anomalous data point is one that deviates by a certain standard deviation from the mean. Traversing mean over time-series data isn't exactly trivial, as it's not static. You would need a rolling window to compute the average across the data points. Technically, this is called a rolling average or a moving average, and it's intended to smooth short-term fluctuations and highlight long-term ones.

artificial intelligence, data mining, machine learning, (7 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.40)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.34)

Add feedback

Practical Machine Learning Tutorial with Python Intro p.1

#artificialintelligenceMay-23-2017, 19:15:53 GMT

The objective of this course is to give you a wholistic understanding of machine learning, covering theory, application, and inner workings of supervised, unsupervised, and deep learning algorithms. In this series, we'll be covering linear regression, K Nearest Neighbors, Support Vector Machines (SVM), flat clustering, hierarchical clustering, and neural networks. For each major algorithm that we cover, we will discuss the high level intuitions of the algorithms and how they are logically meant to work. Next, we'll apply the algorithms in code using real world data sets along with a module, such as with Scikit-Learn. Finally, we'll be diving into the inner workings of each of the algorithms by recreating them in code, from scratch, ourselves, including all of the math involved.

algorithm, artificial intelligence, deep learning, (4 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.59)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.59)

Add feedback

Implementing Your Own k-Nearest Neighbour Algorithm Using Python

@machinelearnbotMay-22-2017, 07:02:02 GMT

A detailed explanation of one of the most used machine learning algorithms, k-Nearest Neighbors, and its implementation from scratch in Python.

artificial intelligence, machine learning, own k-nearest neighbour algorithm, (2 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

ŷhat Intuitive Classification using KNN and Python

@machinelearnbotMay-18-2017, 18:05:30 GMT

K-nearest neighbors, or KNN, is a supervised learning algorithm for either classification or regression. It's super intuitive and has been applied to many types of problems. To make a personalized offer to one customer, you might employ KNN to find similar customers and base your offer on their purchase behaviors. KNN has also been applied to medical diagnosis and credit scoring. This is a post about the K-nearest neighbors algorithm and Python.

artificial intelligence, machine learning, neighbor, (11 more...)

@machinelearnbot

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback