AITopics | Nearest Neighbor Methods

Collaborating Authors

Nearest Neighbor Methods

News Overviews Instructional Materials AI-Alerts Classics

k-Nearest Neighbors & Anomaly Detection Tutorial

#artificialintelligenceOct-6-2016, 14:36:58 GMT

Announcement Layman Tutorials for Data Science site Annalyzin is now called Algobeans! We're creating a new mailing list to deliver tutorials to your inbox. If you'd like to be included, sign up: If you're already subscribed, signing up to this new mailing list will remove you from the old one. Have you ever wondered about the difference between red and white wine? Some assume that red wine is made from red grapes, and white wine is made from white grapes.

artificial intelligence, data mining, machine learning, (15 more...)

#artificialintelligence

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.87)

Add feedback

Data Science: Supervised Machine Learning in Python

@machinelearnbotOct-1-2016, 10:31:05 GMT

In recent years, we've seen a resurgence in AI, or artificial intelligence, and machine learning. Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts. Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning. Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.

artificial intelligence, machine learning, reinforcement learning, (6 more...)

@machinelearnbot

Genre: Instructional Material > Course Syllabus & Notes (0.68)

Industry:

Automobiles & Trucks (0.78)
Information Technology (0.72)
Leisure & Entertainment > Games (0.57)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.55)
(3 more...)

Add feedback

An Introduction to Machine Learning in Julia

#artificialintelligenceSep-30-2016, 23:30:34 GMT

Machine learning is now pervasive in every field of inquiry and has lead to breakthroughs in various fields from medical diagnoses to online advertising. Practical machine learning is quite computationally intensive, whether it involves millions of repetitions of simple mathematical methods such as Euclidian Distance or more intricate optimizers or backpropagation algorithms. Such computationally intensive techniques need a fast and expressive language – one that enables scientists to write simple, readable code that performs well. In this post, we introduce a simple machine learning algorithm called K Nearest Neighbors, and demonstrate certain Julia features that allow for its easy and efficient implementation. We will demonstrate that the code we write is inherently generic, and show the use of the same code to run on GPUs via the ArrayFire package.

artificial intelligence, computation, machine learning, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.59)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.52)

Add feedback

Decision Trees and Political Party Classification

#artificialintelligenceSep-29-2016, 08:00:26 GMT

Last time we investigated the k-nearest-neighbors algorithm and the underlying idea that one can learn a classification rule by copying the known classification of nearby data points. This required that we view our data as sitting inside a metric space; that is, we imposed a kind of geometric structure on our data. One glaring problem is that there may be no reasonable way to do this. While we mentioned scaling issues and provided a number of possible metrics in our primer, a more common problem is that the data simply isn't numeric. For instance, a poll of US citizens might ask the respondent to select which of a number of issues he cares most about. There could be 50 choices, and there is no reasonable way to assign these numerical values so that all are equidistant in the resulting metric space. Another issue is that the quality of the data could be bad. For instance, there may be missing values for some attributes (e.g., a respondent may neglect to answer one or more questions).

artificial intelligence, decision tree learning, machine learning, (17 more...)

#artificialintelligence

Country: North America > United States (0.14)

Industry: Government (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.54)

Add feedback

?hat Intuitive Classification using KNN and Python

#artificialintelligenceSep-24-2016, 20:15:30 GMT

K-nearest neighbors, or KNN, is a supervised learning algorithm for either classification or regression. It's super intuitive and has been applied to many types of problems. To make a personalized offer to one customer, you might employ KNN to find similar customers and base your offer on their purchase behaviors. KNN has also been applied to medical diagnosis and credit scoring. This is a post about the K-nearest neighbors algorithm and Python.

artificial intelligence, machine learning, neighbor, (11 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Using Z-values to efficiently compute k-nearest neighbors for Apache Flink – Insight Data

#artificialintelligenceSep-5-2016, 06:35:31 GMT

In an earlier post, I described work that I had initially done as an Insight Data Engineering Fellow. That work, now merged into Flink's master branch, was to do an efficient exact k-nearest neighbors (KNN) query using quadtrees. I have since worked on an approximate version of the KNN algorithm, and I will discuss one method I used for the approximate version using Z-value based hashing. For large and high dimensional data sets, an exact k-nearest neighbors query can become infeasible. There are many algorithms that reduce the dimensionality of the points by hashing them to lower dimensions.

artificial intelligence, machine learning, z-value, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Newbie's Guide to ML -- Part 3 – ML for Newbies

#artificialintelligenceAug-29-2016, 10:41:21 GMT

In part 1 I gave a brief introduction to classification. Just to recap, classification is the problem of identifying which group a piece of data belongs to. It's an example of supervised learning because the classifier predicts the classes based on the training data fed to it. An example of classification is finding out whether an email is spam or not. More formally, classification is about finding out a model that distinguishes one class of data from another so as to predict the class of data whose class is unknown.

artificial intelligence, classification, machine learning, (12 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.40)

Add feedback

Wasserstein Discriminant Analysis

Flamary, Rémi, Cuturi, Marco, Courty, Nicolas, Rakotomamonjy, Alain

arXiv.org Machine LearningAug-29-2016

Wasserstein Discriminant Analysis (WDA) is a new supervised method that can improve classification of high-dimensional data by computing a suitable linear map onto a lower dimensional subspace. Following the blueprint of classical Linear Discriminant Analysis (LDA), WDA selects the projection matrix that maximizes the ratio of two quantities: the dispersion of projected points coming from different classes, divided by the dispersion of projected points coming from the same class. To quantify dispersion, WDA uses regularized Wasserstein distances, rather than cross-variance measures which have been usually considered, notably in LDA. Thanks to the the underlying principles of optimal transport, WDA is able to capture both global (at distribution scale) and local (at samples scale) interactions between classes. Regularized Wasserstein distances can be computed using the Sinkhorn matrix scaling algorithm; We show that the optimization of WDA can be tackled using automatic differentiation of Sinkhorn iterations. Numerical experiments show promising results both in terms of prediction and visualization on toy examples and real life datasets such as MNIST and on deep features obtained from a subset of the Caltech dataset.

artificial intelligence, machine learning, wasserstein distance, (17 more...)

arXiv.org Machine Learning

1608.08063

Country:

North America > United States (0.32)
Europe (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.47)

Add feedback

Every Data Science Interview Boiled Down To Five Basic Questions

#artificialintelligenceAug-28-2016, 15:30:25 GMT

Data science interviews are daunting, complicated gauntlets for many. But despite the ways they're evolving, the technical portion of the typical data science interview tends to be pretty predictable. The questions most candidates face usually cover behavior, mathematics, statistics, coding, and scenarios. However they differ in their particulars, those questions may be easier to answer if you can identify which bucket each one falls into. Here's a breakdown, and what you can do to prepare.

artificial intelligence, knowledge, machine learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.30)

Add feedback

About Feature Scaling and Normalization

#artificialintelligenceAug-22-2016, 03:10:28 GMT

The result of standardization (or Z-score normalization) is that the features will be rescaled so that they'll have the properties of a standard normal distribution with Standardizing the features so that they are centered around 0 with a standard deviation of 1 is not only important if we are comparing measurements that have different units, but it is also a general requirement for many machine learning algorithms. Intuitively, we can think of gradient descent as a prominent example (an optimization algorithm often used in logistic regression, SVMs, perceptrons, neural networks etc.); with features being on different scales, certain weights may update faster than others since the feature values play a role in the weight updates Other intuitive examples include K-Nearest Neighbor algorithms and clustering algorithms that use, for example, Euclidean distance measures – in fact, tree-based classifier are probably the only classifiers where feature scaling doesn't make a difference. In fact, the only family of algorithms that I could think of being scale-invariant are tree-based methods. Let's take the general CART decision tree algorithm. Without going into much depth regarding information gain and impurity measures, we can think of the decision as "is feature x_i some_val?"

artificial intelligence, machine learning, standardization, (12 more...)

#artificialintelligence

Country:

North America > United States > California > Orange County > Irvine (0.05)
Europe > Italy > Liguria > Genoa (0.05)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.49)

Add feedback