Goto

Collaborating Authors

 Nearest Neighbor Methods


Reverse image search engines using out of the box machine learning libraries

@machinelearnbot

We propose a simple, robust, and scalable reverse image search engine that leverages convolutional features from Keras' pre-trained neural networks and the distance metric from Scikit-Learn's K-Nearest Neighbors. We show example queries using data scraped from Google images, and dive deeper in how we use the search engine to track the proliferation of memes from the dark web.


A Beginner's Guide to Machine Learning (in Python)

@machinelearnbot

In this course, you will learn the basics of Machine Learning and Data Mining; almost everything you need to get started. You will understand what Big Data is and what Data Science and Data Analytics is. You will learn algorithms such as Linear Regression, Logistic Regression, Support Vector Machine, K-Nearest Neighbor, Decision Trees, and Neural Networks. You'll also understand how to combine algorithms into ensembles. Preprocessing data will be taught and you will understand how to clean your data, transform it, how to handle categorical features, and how to handle unbalanced data.


Extending Machine Learning Algorithms Udemy

@machinelearnbot

Complex statistics in Machine Learning worry a lot of developers. Knowing statistics helps you build strong Machine Learning models that are optimized for a given problem statement. Understand the real-world examples that discuss the statistical side of Machine Learning and familiarize yourself with it. We will use libraries such as scikit-learn, e1071, randomForest, c50, xgboost, and so on.We will discuss the application of frequently used algorithms on various domain problems, using both Python and R programming.It focuses on the various tree-based machine learning models used by industry practitioners.We will also discuss k-nearest neighbors, Naive Bayes, Support Vector Machine and recommendation engine.By the end of the course, you will have mastered the required statistics for Machine Learning Algorithm and will be able to apply your new skills to any sort of industry problem. Pratap Dangeti develops machine learning and deep learning solutions for structured, image, and text data at TCS, in its research and innovation lab in Bangalore.


Machine Learning with Scikit-learn Udemy

@machinelearnbot

Machine learning is the buzzword bringing computer science and statistics together to build smart and efficient models. Using powerful algorithms and techniques offered by machine learning, you can automate any analytical model. This course examines a variety of machine learning models including popular machine learning algorithms such as k-nearest neighbors, logistic regression, naive Bayes, k-means, decision trees, and artificial neural networks. You will build systems that classify documents, recognize images, detect ads, and more. You'll learn to use scikit-learn's API to extract features from categorical variables, text and images; evaluate model performance; and develop an intuition for how to improve your model's performance.


Multiple-Implementation Testing of Supervised Learning Software

AAAI Conferences

Machine Learning (ML) algorithms are now used in a wide range of application domains in society. Naturally, software implementations of these algorithms have become ubiquitous. Faults in ML software can cause substantial losses in these application domains. Thus, it is very critical to conduct effective testing of ML software to detect and eliminate its faults. However, testing ML software is difficult, partly because producing test oracles used for checking behavior correctness (such as using expected properties or expected test outputs) is challenging. In this paper, we propose an approach of multiple-implementation testing to test supervised learning software, a major type of ML software. In particular, our approach derives a test input's proxy oracle from the majority-voted output running the test input of multiple implementations of the same algorithm (based on a pre-defined percentage threshold). Our approach reports likely those test inputs whose outputs (produced by an implementation under test) are different from the majority-voted outputs as failing tests. We evaluate our approach on two highly popular supervised learning algorithms: k-Nearest Neighbor (kNN) and Naive Bayes (NB). Our results show that our approach is highly effective in detecting faults in real-world supervised learning software. In particular, our approach detects 13 real faults and 1 potential fault from 19 kNN implementations and 16 real faults from 7 NB implementations. Our approach can even detect 7 real faults and 1 potential fault among the three popularly used open-source ML projects (Weka, RapidMiner,ย  and KNIME).


Building & Improving a K-Nearest Neighbors Algorithm in Python

#artificialintelligence

The K-Nearest Neighbors algorithm, K-NN for short, is a classic machine learning work horse algorithm that is often overlooked in the day of deep learning. In this tutorial, we will build a K-NN algorithm in Scikit-Learn and run it on the MNIST dataset. From there, we will build our own K-NN algorithm in the hope of developing a classifier with both better accuracy and classification speed than the Scikit-Learn K-NN. The K-Nearest Neighbors algorithm is a supervised machine learning algorithm that is simple to implement, and yet has the ability to make robust classifications. One of the biggest advantages of K-NN is that it is a lazy-learner.


On the Resistance of Neural Nets to Label Noise

arXiv.org Machine Learning

We investigate the behavior of convolutional neural networks (CNN) in the presence of label noise. We show empirically that CNN prediction for a given test sample depends on the labels of the training samples in its local neighborhood. This is similar to the way that the K-nearest neighbors (K-NN) classifier works. With this understanding, we derive an analytical expression for the expected accuracy of a K-NN, and hence a CNN, classifier for any level of noise. In particular, we show that K-NN, and CNN, are resistant to label noise that is randomly spread across the training set, but are very sensitive to label noise that is concentrated. Experiments on real datasets validate our analytical expression by showing that they match the empirical results for varying degrees of label noise.


Learning to generate classifiers

arXiv.org Machine Learning

We train a network to generate mappings between training sets and classification policies (a 'classifier generator') by conditioning on the entire training set via an attentional mechanism. The network is directly optimized for test set performance on an training set of related tasks, which is then transferred to unseen 'test' tasks. We use this to optimize for performance in the low-data and unsupervised learning regimes, and obtain significantly better performance in the 10-50 datapoint regime than support vector classifiers, random forests, XGBoost, and k-nearest neighbors on a range of small datasets.


Introduction to k-Nearest Neighbors

@machinelearnbot

The k-Nearest-Neighbors (kNN) method of classification is one of the simplest methods in machine learning, and is a great way to introduce yourself to machine learning and classification in general. At its most basic level, it is essentially classification by finding the most similar data points in the training data, and making an educated guess based on their classifications. Although very simple to understand and implement, this method has seen wide application in many domains, such as in recommendation systems, semantic searching, and anomaly detection. As we would need to in any machine learning problem, we must first find a way to represent data points as feature vectors. A feature vector is our mathematical representation of data, and since the desired characteristics of our data may not be inherently numerical, preprocessing and feature-engineering may be required in order to create these vectors.


Local Distance Metric Learning for Nearest Neighbor Algorithm

arXiv.org Machine Learning

Distance metric learning is a successful way to enhance the performance of the nearest neighbor classifier. In most cases, however, the distribution of data does not obey a regular form and may change in different parts of the feature space. Regarding that, this paper proposes a novel local distance metric learning method, namely Local Mahalanobis Distance Learning (LMDL), in order to enhance the performance of the nearest neighbor classifier. LMDL considers the neighborhood influence and learns multiple distance metrics for a reduced set of input samples. The reduced set is called as prototypes which try to preserve local discriminative information as much as possible. The proposed LMDL can be kernelized very easily, which is significantly desirable in the case of highly nonlinear data. The quality as well as the efficiency of the proposed method assesses through a set of different experiments on various datasets and the obtained results show that LDML as well as the kernelized version is superior to the other related state-of-the-art methods.