Statistical Learning



This is Jun and I'm a front-end developer at LINE. Recently the front-end development scene is booming with new technologies and it's becoming difficult to catch up with all those new things. As part of today's topic as "implementing machine learning in front-end web development", let me share my little experiment on machine learning, using TensorFlow.js. I used TensorFlow.js to build a simple classification model for browsers. Before I go into the details, I'll briefly introduce about TensorFlow and classification.

Teaching - CS 229


CS 229 ― Machine Learning My twin brother Afshine and I created this set of illustrated Machine Learning cheatsheets covering the content of the CS 229 class, which I TA-ed in Fall 2018 at Stanford. They can (hopefully!) be useful to all future students of this course as well as to anyone else interested in Machine Learning.

Biasing MCTS with Features for General Games Artificial Intelligence

This paper proposes using a linear function approximator, rather than a deep neural network (DNN), to bias a Monte Carlo tree search (MCTS) player for general games. This is unlikely to match the potential raw playing strength of DNNs, but has advantages in terms of generality, interpretability and resources (time and hardware) required for training. Features describing local patterns are used as inputs. The features are formulated in such a way that they are easily interpretable and applicable to a wide range of general games, and might encode simple local strategies. We gradually create new features during the same self-play training process used to learn feature weights. We evaluate the playing strength of an MCTS player biased by learnt features against a standard upper confidence bounds for trees (UCT) player in multiple different board games, and demonstrate significantly improved playing strength in the majority of them after a small number of self-play training games.

Empirical Evaluations of Seed Set Selection Strategies for Predictive Coding Artificial Intelligence

Training documents have a significant impact on the performance of predictive models in the legal domain. Yet, there is limited research that explores the effectiveness of the training document selection strategy - in particular, the strategy used to select the seed set, or the set of documents an attorney reviews first to establish an initial model. Since there is limited research on this important component of predictive coding, the authors of this paper set out to identify strategies that consistently perform well. Our research demonstrated that the seed set selection strategy can have a significant impact on the precision of a predictive model. Enabling attorneys with the results of this study will allow them to initiate the most effective predictive modeling process to comb through the terabytes of data typically present in modern litigation. This study used documents from four actual legal cases to evaluate eight different seed set selection strategies. Attorneys can use the results contained within this paper to enhance their approach to predictive coding.

On Sampling Random Features From Empirical Leverage Scores: Implementation and Theoretical Guarantees Artificial Intelligence

Random features provide a practical framework for large-scale kernel approximation and supervised learning. It has been shown that data-dependent sampling of random features using leverage scores can significantly reduce the number of features required to achieve optimal learning bounds. Leverage scores introduce an optimized distribution for features based on an infinite-dimensional integral operator (depending on input distribution), which is impractical to sample from. Focusing on empirical leverage scores in this paper, we establish an out-of-sample performance bound, revealing an interesting trade-off between the approximated kernel and the eigenvalue decay of another kernel in the domain of random features defined based on data distribution. Our experiments verify that the empirical algorithm consistently outperforms vanilla Monte Carlo sampling, and with a minor modification the method is even competitive to supervised data-dependent kernel learning, without using the output (label) information.

PCA and SVD explained with numpy


How exactly are principal component analysis and singular value decomposition related and how to implement using numpy. Principal component analysis (PCA) and singular value decomposition (SVD) are commonly used dimensionality reduction approaches in exploratory data analysis (EDA) and Machine Learning. They are both classical linear dimensionality reduction methods that attempt to find linear combinations of features in the original high dimensional data matrix to construct meaningful representation of the dataset. They are preferred by different fields when it comes to reducing the dimensionality: PCA are often used by biologists to analyze and visualize the source variances in datasets from population genetics, transcriptomics, proteomics and microbiome. Meanwhile, SVD, particularly its reduced version truncated SVD, is more popular in the field of natural language processing to achieve a representation of the gigantic while sparse word frequency matrices.

Natural language processing helps hospital predict downstream demand for imaging services


The authors said an automated method for predicting future imaging resource utilization could help streamline the process, paving the way for capacity management strategies that could help meet the increased but unpredictable demand for radiology services. Using data from all hepatocellular carcinoma (HCC) surveillance CT exams performed at their hospital between 2010 and 2017, they used open-source NLP and machine learning software to parse free-text radiology reports into bag-of-words and term frequency-inverse document frequency (TF-IDF) models. In NLP, bag-of-words refers to the frequency with which words occur in a report summary, while TF-IDF considers the number of times a word appears in the summary and measures the uniqueness of specific terms in the context of entire report collections. Brown and Kachura also used three machine learning techniques--logistic regression, support vector machine (SVM) and random forest--to make their predictions. As a whole, the authors found bag-of-words models were somewhat inferior to the TF-IDF approach, with the TF-IDF and SVM combination yielding the most favorable results.

Machine Learning Algorithms In Layman's Terms, Part 1


As a recent graduate of the Flatiron School's Data Science Bootcamp, I've been inundated with advice on how to ace technical interviews. A soft skill that keeps coming to the forefront is the ability to explain complex machine learning algorithms to a non-technical person. This series of posts is me sharing with the world how I would explain all the machine learning topics I come across on a regular my grandma. Some get a bit in-depth, others less so, but all I believe are useful to a non-Data Scientist. In the upcoming parts of this series, I'll be going over: To summarize, an algorithm is the mathematical life force behind a model.

Regularization techniques for Neural Networks


In our last post, we learned about feedforward neural networks and how to design them. In this post, we will learn how to tackle one of the most central problems that arise in the domain of machine learning, that is how to make our algorithm to find a perfect fit not only to the training set but also to the testing set. When an algorithm performs well on the training set but performs poorly on the testing set, the algorithm is said to be overfitted on the Training data. After all, our main goal is to perform well on never seen before data, ie reducing the overfitting. To tackle this problem we have to make our model generalize over the training data which is done using various regularization techniques which we will learn about in this post.

How to Automatically Determine the Number of Clusters in your Data - and more


Determining the number of clusters when performing unsupervised clustering is a tricky problem. Many data sets don't exhibit well separated clusters, and two human beings asked to visually tell the number of clusters by looking at a chart, are likely to provide two different answers. Sometimes clusters overlap with each other, and large clusters contain sub-clusters, making a decision not easy. For instance, how many clusters do you see in the picture below? What is the optimum number of clusters?