Performance Analysis


Validation techniques beyond K-fold

#artificialintelligence

A validation dataset is a sample of data held back from training your model that is used to give an estimate of model skill while tuning the model's hyperparameters. The validation dataset is different from the test dataset that is also held back from the training of the model, but is instead used to give an unbiased estimate of the skill of the final tuned model when comparing or selecting between final models. There is much confusion in applied machine learning about what a validation dataset is exactly and how it differs from a test dataset. Validation techniques in machine learning are used to get the error rate of the ML model, which can be considered as close to the true error rate of the population. If the data volume is large enough to be representative of the population, you may not need the validation techniques.


Machine Learning for ISIC Skin Cancer Classification Challenge

#artificialintelligence

Computer vision based melanoma diagnosis has been a side project of mine on and off for almost 2 years now, so I plan on making this the first of a short series of posts on the topic. This post is intended as a quick/informative read for those with basic machine learning experience looking for an introduction to the ISIC problem, and those just getting out of their first or second machine learning/data mining course who'd like a simple problem to get their hands dirty with. Tools for early diagnosis of different diseases are a major reason machine learning has a lot of people excited today. The process for these innovations is a long one: Labeled datasets need built, engineers and data scientists need trained, and each problem comes with its own set of edge cases that often make building robust classifiers very tricky (even for the experts). Here I'm going to focus on building a classifier.


How to Fix k-Fold Cross-Validation for Imbalanced Classification

#artificialintelligence

Model evaluation involves using the available dataset to fit a model and estimate its performance when making predictions on unseen examples. It is a challenging problem as both the training dataset used to fit the model and the test set used to evaluate it must be sufficiently large and representative of the underlying problem so that the resulting estimate of model performance is not too optimistic or pessimistic. The two most common approaches used for model evaluation are the train/test split and the k-fold cross-validation procedure. Both approaches can be very effective in general, although they can result in misleading results and potentially fail when used on classification problems with a severe class imbalance. In this tutorial, you will discover how to evaluate classifier models on imbalanced datasets.


Tour of Evaluation Metrics for Imbalanced Classification

#artificialintelligence

A classifier is only as good as the metric used to evaluate it. If you choose the wrong metric to evaluate your models, you are likely to choose a poor model, or in the worst case, be misled about the expected performance of your model. Choosing an appropriate metric is challenging generally in applied machine learning, but is particularly difficult for imbalanced classification problems. Firstly, because most of the standard metrics that are widely used assume a balanced class distribution, and because typically not all classes, and therefore, not all prediction errors, are equal for imbalanced classification. In this tutorial, you will discover metrics that you can use for imbalanced classification. Tour of Evaluation Metrics for Imbalanced Classification Photo by Travis Wise, some rights reserved.


What Is the Naive Classifier for Each Imbalanced Classification Metric?

#artificialintelligence

A common mistake made by beginners is to apply machine learning algorithms to a problem without establishing a performance baseline. A performance baseline provides a minimum score above which a model is considered to have skill on the dataset. It also provides a point of relative improvement for all models evaluated on the dataset. A baseline can be established using a naive classifier, such as predicting one class label for all examples in the test dataset. Another common mistake made by beginners is using classification accuracy as a performance metric on problems that have an imbalanced class distribution.


Optimal ROC Curve for a Combination of Classifiers

Neural Information Processing Systems

We present a new analysis for the combination of binary classifiers. We propose a theoretical framework based on the Neyman-Pearson lemma to analyze combinations of classifiers. In particular, we give a method for finding the optimal decision rule for a combination of classifiers and prove that it has the optimal ROC curve. We also show how our method generalizes and improves on previous work on combining classifiers and generating ROC curves. Papers published at the Neural Information Processing Systems Conference.


Predicting response time and error rates in visual search

Neural Information Processing Systems

A model of human visual search is proposed. It predicts both response time (RT) and error rates (RT) as a function of image parameters such as target contrast and clutter. The model is an ideal observer, in that it optimizes the Bayes ratio of tar- get present vs target absent. The ratio is computed on the firing pattern of V1/V2 neurons, modeled by Poisson distributions. The optimal mechanism for integrat- ing information over time is shown to be a'soft max' of diffusions, computed over the visual field by'hypercolumns' of neurons that share the same receptive field and have different response properties to image features.


AI Could Make Up For Lack Of Radiologists In Fight Against Breast Cancer, But It Isn't Ready Yet

#artificialintelligence

Recently a team of researchers from Imperial College London and Google Health created a computer vision model intended to diagnose cases of breast cancer from X-rays. As CNN reports, the model was reportedly trained on X-rays of over 29,000 women, and when pitted against six radiologists the model managed to outperform the assessments of the doctors. Currently, the NHS uses the combined decisions of two doctors in order to diagnose breast cancer from X-rays. If the two doctors end up disagreeing, a third will be brought in to consult on the images. While the doctors had access to the medical records of the patients, the AI device only had the mammograms to base its decisions on.


Bootstrapping from Game Tree Search

Neural Information Processing Systems

In this paper we introduce a new algorithm for updating the parameters of a heuristic evaluation function, by updating the heuristic towards the values computed by an alpha-beta search. Our algorithm differs from previous approaches to learning from search, such as Samuels checkers player and the TD-Leaf algorithm, in two key ways. First, we update all nodes in the search tree, rather than a single node. Second, we use the outcome of a deep search, instead of the outcome of a subsequent search, as the training signal for the evaluation function. We implemented our algorithm in a chess program Meep, using a linear heuristic function.


Correlated Bigram LSA for Unsupervised Language Model Adaptation

Neural Information Processing Systems

We propose using correlated bigram LSA for unsupervised LM adaptation for automatic speech recognition. The model is trained using efficient variational EM and smoothed using the proposed fractional Kneser-Ney smoothing which handles fractional counts. Our approach can be scalable to large training corpora via bootstrapping of bigram LSA from unigram LSA. For LM adaptation, unigram and bigram LSA are integrated into the background N-gram LM via marginal adaptation and linear interpolation respectively. Experimental results show that applying unigram and bigram LSA together yields 6%--8% relative perplexity reduction and 0.6% absolute character error rates (CER) reduction compared to applying only unigram LSA on the Mandarin RT04 test set.