Nearest Neighbor Methods
Differentiable Subset Sampling
Xie, Sang Michael, Ermon, Stefano
Many machine learning tasks require sampling a subset of items from a collection. Due to the non-differentiability of subset sampling, the procedure is usually not included in end-to-end deep learning models. We show that through a connection to weighted reservoir sampling, the Gumbel-max trick can be extended to produce exact subset samples, and that a recently proposed top-k relaxation can be used to differentiate through the subset sampling procedure. We test our method on end-to-end tasks requiring subset sampling, including a differentiable k-nearest neighbors task and an instance-wise feature selection task for model interpretability.
Machine learning with the "diabetes" data set in R – Towards Data Science
We'll begin by applying the k-nearest neighbors method of classifying patients by their similarity to other patients. For this method (and all subsequent methods), we'll start by separating the data set into "training" and "test" sets. We'll build our model based on the relationship between the predictors and the outcome on the training set, then use the model's specifications to predict the outcome on the test set. We can then compare our predicted outcomes to the test set's actual diabetes status to give us a measure of model accuracy. For k-nearest neighbors, we compute the outcome for each test case by comparing that case to the "nearest neighbors" in the training set.
Learning Sublinear-Time Indexing for Nearest Neighbor Search
Dong, Yihe, Indyk, Piotr, Razenshteyn, Ilya, Wagner, Tal
Most of the efficient sublinear-time indexing algorithms for the high-dimensional nearest neighbor search problem (NNS) are based on space partitions of the ambient space $\mathbb{R}^d$. Inspired by recent theoretical work on NNS for general metric spaces [Andoni, Naor, Nikolov, Razenshteyn, Waingarten STOC 2018, FOCS 2018], we develop a new framework for constructing such partitions that reduces the problem to balanced graph partitioning followed by supervised classification. We instantiate this general approach with the KaHIP graph partitioner [Sanders, Schulz SEA 2013] and neural networks, respectively, to obtain a new partitioning procedure called Neural Locality-Sensitive Hashing (Neural LSH). On several standard benchmarks for NNS, our experiments show that the partitions found by Neural LSH consistently outperform partitions found by quantization- and tree-based methods.
Transfer Learning for Image-Based Malware Classification
Bhodia, Niket, Prajapati, Pratikkumar, Di Troia, Fabio, Stamp, Mark
In this paper, we consider the problem of malware detection and classification based on image analysis. We convert executable files to images and apply image recognition using deep learning (DL) models. To train these models, we employ transfer learning based on existing DL models that have been pre-trained on massive image datasets. We carry out various experiments with this technique and compare its performance to that of an extremely simple machine learning technique, namely, k-nearest neighbors (\kNN). For our k-NN experiments, we use features extracted directly from executables, rather than image analysis. While our image-based DL technique performs well in the experiments, surprisingly, it is outperformed by k-NN. We show that DL models are better able to generalize the data, in the sense that they outperform k-NN in simulated zero-day experiments.
Natively Interpretable Machine Learning and Artificial Intelligence: Preliminary Results and Future Directions
Hazard, Christopher J., Fusting, Christopher, Resnick, Michael, Auerbach, Michael, Meehan, Michael, Korobov, Valeri
Machine learning models have become more and more complex in order to better approximate complex functions. Although fruitful in many domains, the added complexity has come at the cost of model interpretability. The once popular k-nearest neighbors (kNN) approach, which finds and uses the most similar data for reasoning, has received much less attention in recent decades due to numerous problems when compared to other techniques. We show that many of these historical problems with kNN can be overcome, and our contribution has applications not only in machine learning but also in online learning, data synthesis, anomaly detection, model compression, and reinforcement learning, without sacrificing interpretability. We introduce a synthesis between kNN and information theory that we hope will provide a clear path towards models that are innately interpretable and auditable. Through this work we hope to gather interest in combining kNN with information theory as a promising path to fully auditable machine learning and artificial intelligence.
A GA-based feature selection of the EEG signals by classification evaluation: Application in BCI systems
Eslahi, Samira Vafay, Dabanloo, Nader Jafarnia, Maghooli, Keivan
In electroencephalogram (EEG) signal processing, finding the appropriate information from a dataset has been a big challenge for successful signal classification. The feature selection methods make it possible to solve this problem; however, the method selection is still under investigation to find out which feature can perform the best to extract the most proper features of the signal to improve the classification performance. In this study, we use the genetic algorithm (GA), a heuristic searching algorithm, to find the optimum combination of the feature extraction methods and the classifiers, in the brain-computer interface (BCI) applications. A BCI system can be practical if and only if it performs with high accuracy and high speed alongside each other. In the proposed method, GA performs as a searching engine to find the best combination of the features and classifications. The features used here are Katz, Higuchi, Petrosian, Sevcik, and box-counting dimension (BCD) feature extraction methods. These features are applied to the wavelet subbands and are classified with four classifiers such as adaptive neuro-fuzzy inference system (ANFIS), fuzzy k-nearest neighbors (FKNN), support vector machine (SVM) and linear discriminant analysis (LDA). Due to the huge number of features, the GA optimization is used to find the features with the optimum fitness value (FV). Results reveal that Katz fractal feature estimation method with LDA classification has the best FV. Consequently, due to the low computation time of the first Daubechies wavelet transformation in comparison to the original signal, the final selected methods contain the fractal features of the first coefficient of the detail subbands.
Machine Learning in Python - PyImageSearch
Struggling to get started with machine learning using Python? In this step-by-step, hands-on tutorial you will learn how to perform machine learning using Python on numerical data and image data. By the time you are finished reading this post, you will be able to get your start in machine learning. To launch your machine learning in Python education, just keep reading! Inside this tutorial, you will learn how to perform machine learning in Python on numerical data and image data. Using this technique you will be able to get your start with machine learning and Python! Along the way, you'll discover popular machine learning algorithms that you can use in your own projects as well, including: This hands-on experience will give you the knowledge (and confidence) you need to apply machine learning in Python to your own projects. Before we can get started with this tutorial you first need to make sure your system is configured for machine learning. Today's code requires the following libraries: In order to help you gain experience performing machine learning in Python, we'll be working with two separate datasets. The first one, the Iris dataset, is the machine learning practitioner's equivalent of "Hello, World!" (likely one of the first pieces of software you wrote when learning how to program). The second dataset, 3-scenes, is an example image dataset I put together -- this dataset will help you gain experience working with image data, and most importantly, learn what techniques work best for numerical/categorical datasets vs. image datasets. Let's go ahead and get a more intimate look at these datasets.
Estimating physical properties from liquid crystal textures via machine learning and complexity-entropy methods
Sigaki, H. Y. D., de Souza, R. F., de Souza, R. T., Zola, R. S., Ribeiro, H. V.
Optical imaging techniques are important tools extensively used for probing a number of materials properties [1]. These imaging techniques are non-destructive and particularly convenient for dealing with biological and other complex materials [2]. Liquid crystals are among these materials widely studied via optical and image processing methods [3]. This occurs because liquid crystals are birefringent materials, and as such, simple polarized optical microscope imaging already access some of their important properties, including birefringence and sample thickness [4]. Moreover, this technique estimates the local ordering properties (for instance, the director distribution) across a sample when coupled with variable retarders and different algorithms for fast and sensitive measurements [5]. This approach is known as LC-PolScope [6] and has been used for fine imaging of defect cores in lyotropic liquid crystals [7] and can describe the orientational order of active nematics [8]. Despite the extensive use of optical imaging approaches in the study of liquid crystals [9-12], much less attention has been paid to the problem of extracting physical parameters directly from images of these materials. This is an important issue since several physical parameters of liquid crystals are only obtained by adjusting theoretical models to cumbersome and time demanding experimental results. Examples include the microscopic order parameter, from which several other parameters characterizing the nematic phase are dependent [3], and the pitch length of cholesteric liquid crystals.
Overfitting or perfect fitting? Risk bounds for classification and regression rules that interpolate
Belkin, Mikhail, Hsu, Daniel J., Mitra, Partha
Many modern machine learning models are trained to achieve zero or near-zero training error in order to obtain near-optimal (but non-zero) test error. This phenomenon of strong generalization performance for ``overfitted'' / interpolated classifiers appears to be ubiquitous in high-dimensional data, having been observed in deep networks, kernel machines, boosting and random forests. Their performance is consistently robust even when the data contain large amounts of label noise. Very little theory is available to explain these observations. The vast majority of theoretical analyses of generalization allows for interpolation only when there is little or no label noise. This paper takes a step toward a theoretical foundation for interpolated classifiers by analyzing local interpolating schemes, including geometric simplicial interpolation algorithm and singularly weighted $k$-nearest neighbor schemes. Consistency or near-consistency is proved for these schemes in classification and regression problems. Moreover, the nearest neighbor schemes exhibit optimal rates under some standard statistical assumptions. Finally, this paper suggests a way to explain the phenomenon of adversarial examples, which are seemingly ubiquitous in modern machine learning, and also discusses some connections to kernel machines and random forests in the interpolated regime.
The Nearest Neighbor Information Estimator is Adaptively Near Minimax Rate-Optimal
Jiao, Jiantao, Gao, Weihao, Han, Yanjun
We analyze the Kozachenko–Leonenko (KL) fixed k-nearest neighbor estimator for the differential entropy. We obtain the first uniform upper bound on its performance for any fixed k over H\"{o}lder balls on a torus without assuming any conditions on how close the density could be from zero. Accompanying a recent minimax lower bound over the H\"{o}lder ball, we show that the KL estimator for any fixed k is achieving the minimax rates up to logarithmic factors without cognizance of the smoothness parameter s of the H\"{o}lder ball for $s \in (0,2]$ and arbitrary dimension d, rendering it the first estimator that provably satisfies this property.