Nearest Neighbor Methods
A Bayes consistent 1-NN classifier
Kontorovich, Aryeh, Weiss, Roi
We show that a simple modification of the 1-nearest neighbor classifier yields a strongly Bayes consistent learner. Prior to this work, the only strongly Bayes consistent proximity-based method was the k-nearest neighbor classifier, for k growing appropriately with sample size. We will argue that a margin-regularized 1-NN enjoys considerable statistical and algorithmic advantages over the k-NN classifier. These include user-friendly finite-sample error bounds, as well as time- and memory-efficient learning and test-point evaluation algorithms with a principled speed-accuracy tradeoff. Encouraging empirical results are reported.
Doing Machine Learning (K Nearest Neighbors) in a Messenger Chatbot
Chatbots are taking over the world. Companies want them, developers are creating them, and more important, people are eager to try them. The use cases and experiences a chatbot can bring seem to be unlimited -- there are bots for managing group purchases, bots for helping people to obtain their immigration visas, and even bots for tracking flights. So, eventually the hype got me and I decided to build my first chatbot. My idea for this chatbot was to see how I could manually train and use a machine learning model through a chat-based interface, while trying several ideas and concepts that could enhance the experience of using a machine learning model in this way.
Minimax-optimal semi-supervised regression on unknown manifolds
Moscovich, Amit, Jaffe, Ariel, Nadler, Boaz
We consider semi-supervised regression when the predictor variables are drawn from an unknown manifold. A simple two step approach to this problem is to: (i) estimate the manifold geodesic distance between any pair of points using both the labeled and unlabeled instances; and (ii) apply a k nearest neighbor regressor based on these distance estimates. We prove that given sufficiently many unlabeled points, this simple method of geodesic kNN regression achieves the optimal finite-sample minimax bound on the mean squared error, as if the manifold were known. Furthermore, we show how this approach can be efficiently implemented, requiring only O(k N log N) operations to estimate the regression function at all N labeled and unlabeled points. We illustrate this approach on two datasets with a manifold structure: indoor localization using WiFi fingerprints and facial pose estimation. In both cases, geodesic kNN is more accurate and much faster than the popular Laplacian eigenvector regressor.
Optimization of distributions differences for classification
Bonyadi, Mohammad Reza, Tieng, Quang M., Reutens, David C.
In this paper we introduce a new classification algorithm called Optimization of Distributions Differences (ODD). The algorithm aims to find a transformation from the feature space to a new space where the instances in the same class are as close as possible to one another while the gravity centers of these classes are as far as possible from one another. This aim is formulated as a multiobjective optimization problem that is solved by a hybrid of an evolutionary strategy and the Quasi-Newton method. The choice of the transformation function is flexible and could be any continuous space function. We experiment with a linear and a non-linear transformation in this paper. We show that the algorithm can outperform 6 other state-of-the-art classification methods, namely naive Bayes, support vector machines, linear discriminant analysis, multi-layer perceptrons, decision trees, and k-nearest neighbors, in 12 standard classification datasets. Our results show that the method is less sensitive to the imbalanced number of instances comparing to these methods. We also show that ODD maintains its performance better than other classification methods in these datasets, hence, offers a better generalization ability.
How to choose the right algorithm for your machine learning problem
With the recent machine learning boom, more and more algorithms have become available that perform exceptionally well on a number of tasks. But knowing beforehand which algorithm will perform best on your specific problem is often not possible. If you had infinite time at your disposal, you could just go through all of them and try them out. The following post shows you a better way to do this, step by step, by relying on known techniques from model selection and hyper-parameter tuning. Before we get in too deep, we want to make sure we brushed up on the basics. In specific, we should know that there are three main categories of machine learning: supervised learning, unsupervised learning, and reinforcement learning.
RIPML: A Restricted Isometry Property based Approach to Multilabel Learning
The multilabel learning problem with large number of labels, features, and data-points has generated a tremendous interest recently. A recurring theme of these problems is that only a few labels are active in any given datapoint as compared to the total number of labels. However, only a small number of existing work take direct advantage of this inherent extreme sparsity in the label space. By the virtue of Restricted Isometry Property (RIP), satisfied by many random ensembles, we propose a novel procedure for multilabel learning known as RIPML. During the training phase, in RIPML, labels are projected onto a random low-dimensional subspace followed by solving a least-square problem in this subspace. Inference is done by a k-nearest neighbor (kNN) based approach. We demonstrate the effectiveness of RIPML by conducting extensive simulations and comparing results with the state-of-the-art linear dimensionality reduction based approaches.
Parameter Free Large Margin Nearest Neighbor for Distance Metric Learning
Song, Kun (Northwestern Polytechnical University) | Nie, Feiping (Northwestern Polytechnical University) | Han, Junwei (Northwestern Polytechnical University) | Li, Xuelong (Xi'an Institute of Optics and Precision Mechanics, Chinese Academy of Sciences)
We introduce a novel supervised metric learning algorithm named parameter free large margin nearest neighbor (PFLMNN) which can be seen as an improvement of the classical large margin nearest neighbor (LMNN) algorithm. The contributions of our work consist of two aspects. First, our method discards the costterm which shrinks the distances between inquiry input and its k target neighbors (the k nearest neighbors with same labels as inquiry input) in LMNN, and only focuses on improving the action to push the imposters (the samples with different labels form the inquiry input) apart out of the neighborhood of inquiry. As a result, our method does not have the parameter needed to tune on the validating set, which makes it more convenient to use. Second, by leveraging the geometry information of the imposters, we construct a novel cost function to penalize the smalldistances between each inquiry and its imposters. Different from LMNN considering every imposter located in the neighborhood of each inquiry, our method only takes care of the nearest imposters. Because when the nearest imposter is pushed out of the neighborhood of its inquiry, other imposters would be all out. In this way, the constraints in our model are much less than that of LMNN, which makes our method much easier to find the optimal distance metric. Consequently, our method not only learns a better distance metric than LMNN, but also runs faster than LMNN. Extensive experiments on different data sets with various sizes and difficulties are conducted, and the results have shown that, compared with LMNN, PFLMNN achieves better classification results.
Dynamic time warping distance for message propagation classification in Twitter
Jendoubi, Siwar, Martin, Arnaud, Liรฉtard, Ludovic, Yaghlane, Boutheina Ben, Hadji, Hend Ben
Social messages classification is a research domain that has attracted the attention of many researchers in these last years. Indeed, the social message is different from ordinary text because it has some special characteristics like its shortness. Then the development of new approaches for the processing of the social message is now essential to make its classification more efficient. In this paper, we are mainly interested in the classification of social messages based on their spreading on online social networks (OSN). We proposed a new distance metric based on the Dynamic Time Warping distance and we use it with the probabilistic and the evidential k Nearest Neighbors (k-NN) classifiers to classify propagation networks (PrNets) of messages. The propagation network is a directed acyclic graph (DAG) that is used to record propagation traces of the message, the traversed links and their types. We tested the proposed metric with the chosen k-NN classifiers on real world propagation traces that were collected from Twitter social network and we got good classification accuracies.
k*-Nearest Neighbors: From Global to Local
The weighted k-nearest neighbors algorithm is one of the most fundamental non-parametric methods in pattern recognition and machine learning. The question of setting the optimal number of neighbors as well as the optimal weights has received much attention throughout the years, nevertheless this problem seems to have remained unsettled. In this paper we offer a simple approach to locally weighted regression/classification, where we make the bias-variance tradeoff explicit. Our formulation enables us to phrase a notion of optimal weights, and to efficiently find these weights as well as the optimal number of neighbors efficiently and adaptively, for each data point whose value we wish to estimate. The applicability of our approach is demonstrated on several datasets, showing superior performance over standard locally weighted methods.
Why do Decision Trees Work?
In this article we will discuss the machine learning method called "decision trees", moving quickly over the usual "how decision trees work" and spending time on "why decision trees work." We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning theory more comprehensible. The goal of this article is to set up terminology so we can state in one or two sentences why decision trees tend to work well in practice. Newcomers to data science are often disappointed to learn that the job of the data scientist isn't tweaking and inventing new machine learning algorithms. In the "big data" world supervised learning has been a solved problem since at least 1951 (see [FixHodges1951] for neighborhood density methods, see [GordonOlshen1978] for k-nearest neighbor and decision tree methods).