Nearest Neighbor Methods
30 Questions to test a data scientist on K-Nearest Neighbors (kNN)
If you were to ask me 2 most intuitive algorithms in machine learning โ it would be k-Nearest Neighbours (kNN) and tree based algorithms. Both of them are simple to understand, easy to explain and perfect to demonstrate to people. Interestingly, we had skill tests for both these algorithms last month. If you are new to machine learning, make sure you test yourself on understanding of both of these algorithms. They are simplistic, but immensely powerful and used extensively in industry.
Causal nearest neighbor rules for optimal treatment regimes
Zhou, Xin, Kosorok, Michael R.
The estimation of optimal treatment regimes is of considerable interest to precision medicine. In this work, we propose a causal $k$-nearest neighbor method to estimate the optimal treatment regime. The method roots in the framework of causal inference, and estimates the causal treatment effects within the nearest neighborhood. Although the method is simple, it possesses nice theoretical properties. We show that the causal $k$-nearest neighbor regime is universally consistent. That is, the causal $k$-nearest neighbor regime will eventually learn the optimal treatment regime as the sample size increases. We also establish its convergence rate. However, the causal $k$-nearest neighbor regime may suffer from the curse of dimensionality, i.e. performance deteriorates as dimensionality increases. To alleviate this problem, we develop an adaptive causal $k$-nearest neighbor method to perform metric selection and variable selection simultaneously. The performance of the proposed methods is illustrated in simulation studies and in an analysis of a chronic depression clinical trial.
Randomized Near Neighbor Graphs, Giant Components, and Applications in Data Science
Linderman, George C., Mishne, Gal, Kluger, Yuval, Steinerberger, Stefan
If we pick $n$ random points uniformly in $[0,1]^d$ and connect each point to its $k-$nearest neighbors, then it is well known that there exists a giant connected component with high probability. We prove that in $[0,1]^d$ it suffices to connect every point to $ c_{d,1} \log{\log{n}}$ points chosen randomly among its $ c_{d,2} \log{n}-$nearest neighbors to ensure a giant component of size $n - o(n)$ with high probability. This construction yields a much sparser random graph with $\sim n \log\log{n}$ instead of $\sim n \log{n}$ edges that has comparable connectivity properties. This result has nontrivial implications for problems in data science where an affinity matrix is constructed: instead of picking the $k-$nearest neighbors, one can often pick $k' \ll k$ random points out of the $k-$nearest neighbors without sacrificing efficiency. This can massively simplify and accelerate computation, we illustrate this with several numerical examples.
Making your First Machine Learning Classifier in Scikit-learn (Python) Codementor
One of the most amazing things about Python's scikit-learn library is that is has a 4-step modeling pattern that makes it easy to code a machine learning classifier. While this tutorial uses a classifier called Logistic Regression, the coding process in this tutorial applies to other classifiers in sklearn (Decision Tree, K-Nearest Neighbors etc). In this tutorial, we use Logistic Regression to predict digit labels based on images. The image above shows a bunch of training digits (observations) from the MNIST dataset whose category membership is known (labels 0โ9). After training a model with logistic regression, it can be used to predict an image label (labels 0โ9) given an image. The first part of this tutorial post goes over a toy dataset (digits dataset) to show quickly illustrate scikit-learn's 4 step modeling pattern and show the behavior of the logistic regression algorthm.
sameermahajan/MLWorkshop
This fast paced hands on worskhop is designed to bootstrap your Deep Learning. It introduces algorithms like k Nearest Neighbors, k means, recommender systems etc. It brings in tools like python for quick coding,pandas and numpy for data munging, matplotlib for visualization, scikit-learn for ready made machine learning algorithms. It does so with real life use cases like predicting house sale prices, sentiment analysis using restaurant reviews; real life data like people wikipedia, adult income data etc. and lots of hands on coding. We dive into intuition behind commonly popular algorithm of gradient descent, forward and backward propagation in neural networks.
Creating Next Gen Log Analysis with AI - DRAFT
Stage 2 - Use TensorFlow / Keras based AI and ML to detect patterns in data. We used K-Nearest neighbors algorithm to identify and classify patterns. Stage 3 - This is people centric stage where the output from Stage 2 is consulted with BI / Admins and System admins along with Business Stewards to help identify which type of errors effect organizations more (this stage ensures that classification priority of errors are organization specific and not generic). Stage 4 - Learn now again from the tags in Stage 3 and build and distribute models. We used SVM model this time to classify errors as severity 1-5 (target label 0-4 in multi-class classification).
Active Tolerant Testing
In this work, we give the first algorithms for tolerant testing of nontrivial classes in the active model: estimating the distance of a target function to a hypothesis class C with respect to some arbitrary distribution D, using only a small number of label queries to a polynomial-sized pool of unlabeled examples drawn from D. Specifically, we show that for the class D of unions of d intervals on the line, we can estimate the error rate of the best hypothesis in the class to an additive error epsilon from only $O(\frac{1}{\epsilon^6}\log \frac{1}{\epsilon})$ label queries to an unlabeled pool of size $O(\frac{d}{\epsilon^2}\log \frac{1}{\epsilon})$. The key point here is the number of labels needed is independent of the VC-dimension of the class. This extends the work of Balcan et al. [2012] who solved the non-tolerant testing problem for this class (distinguishing the zero-error case from the case that the best hypothesis in the class has error greater than epsilon). We also consider the related problem of estimating the performance of a given learning algorithm A in this setting. That is, given a large pool of unlabeled examples drawn from distribution D, can we, from only a few label queries, estimate how well A would perform if the entire dataset were labeled? We focus on k-Nearest Neighbor style algorithms, and also show how our results can be applied to the problem of hyperparameter tuning (selecting the best value of k for the given learning problem).
Rate-optimal Meta Learning of Classification Error
Iranzad, Morteza Noshad, Hero, Alfred O. III
Meta learning of optimal classifier error rates allows an experimenter to empirically estimate the intrinsic ability of any estimator to discriminate between two populations, circumventing the difficult problem of estimating the optimal Bayes classifier. To this end we propose a weighted nearest neighbor (WNN) graph estimator for a tight bound on the Bayes classification error; the Henze-Penrose (HP) divergence. Similar to recently proposed HP estimators [berisha2016], the proposed estimator is non-parametric and does not require density estimation. However, unlike previous approaches the proposed estimator is rate-optimal, i.e., its mean squared estimation error (MSEE) decays to zero at the fastest possible rate of $O(1/M+1/N)$ where $M,N$ are the sample sizes of the respective populations. We illustrate the proposed WNN meta estimator for several simulated and real data sets.
Potential Conditional Mutual Information: Estimators, Properties and Applications
Rahimzamani, Arman, Kannan, Sreeram
The conditional mutual information I(X;Y|Z) measures the average information that X and Y contain about each other given Z. This is an important primitive in many learning problems including conditional independence testing, graphical model inference, causal strength estimation and time-series problems. In several applications, it is desirable to have a functional purely of the conditional distribution p_{Y|X,Z} rather than of the joint distribution p_{X,Y,Z}. We define the potential conditional mutual information as the conditional mutual information calculated with a modified joint distribution p_{Y|X,Z} q_{X,Z}, where q_{X,Z} is a potential distribution, fixed airport. We develop K nearest neighbor based estimators for this functional, employing importance sampling, and a coupling trick, and prove the finite k consistency of such an estimator. We demonstrate that the estimator has excellent practical performance and show an application in dynamical system inference.