Goto

Collaborating Authors

 holdout method


Model evaluation, model selection, and algorithm selection in machine learning

#artificialintelligence

A single-PDF version of Model Evaluation parts 1-4 is available on arXiv: https://arxiv.org/abs/1811.12808 Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to specify. These tuning knobs, the so-called hyperparameters, help us control the behavior of machine learning algorithms when optimizing for performance, finding the right balance between bias and variance. Hyperparameter tuning for performance optimization is an art in itself, and there are no hard-and-fast rules that guarantee best performance on a given dataset. In Part I and Part II, we saw different holdout and bootstrap techniques for estimating the generalization performance of a model. We learned about the bias-variance trade-off, and we computed the uncertainty of our estimates. In this third part, we will focus on different methods of cross-validation for model evaluation and model selection. We will use these cross-validation techniques to rank models from several hyperparameter configurations and estimate how well they generalize to independent datasets.


Model Evaluation, Model Selection, and Algorithm Selection in Machine Learning

arXiv.org Machine Learning

The correct use of model evaluation, model selection, and algorithm selection techniques is vital in academic machine learning research as well as in many industrial settings. This article reviews different techniques that can be used for each of these three subtasks and discusses the main advantages and disadvantages of each technique with references to theoretical and empirical studies. Further, recommendations are given to encourage best yet feasible practices in research and applications of machine learning. Common methods such as the holdout method for model evaluation and selection are covered, which are not recommended when working with small datasets. Different flavors of the bootstrap technique are introduced for estimating the uncertainty of performance estimates, as an alternative to confidence intervals via normal approximation if bootstrapping is computationally feasible. Common cross-validation techniques such as leave-one-out cross-validation and k-fold cross-validation are reviewed, the bias-variance trade-off for choosing k is discussed, and practical tips for the optimal choice of k are given based on empirical evidence. Different statistical tests for algorithm comparisons are presented, and strategies for dealing with multiple comparisons such as omnibus tests and multiple-comparison corrections are discussed. Finally, alternative methods for algorithm selection, such as the combined F-test 5x2 cross-validation and nested cross-validation, are recommended for comparing machine learning algorithms when datasets are small.


Real-World Machine Learning: Model Evaluation & Optimization

#artificialintelligence

The primary goal of supervised machine learning is accurate prediction. We want our ML model to be as accurate as possible when predicting on new data (for which the target variable is unknown). Said in a different way, we want our models, which have been built from some training data, to generalize well to new data. That way, when we deploy the model in production, we can be assured that the predictions generated are of high quality. Therefore, when we evaluate the performance of a model, we want to determine how well that model will perform on new data.


3 ideas to add to your data science toolkit

#artificialintelligence

I'm always on the lookout for ideas that can improve how I tackle data analysis projects. I particularly favor approaches that translate to tools I can use repeatedly. Most of the time, I find these tools on my own--by trial and error--or by consulting other practitioners. I also have an affinity for academics and academic research, and I often tweet about research papers that I come across and am intrigued by. Often, academic research results don't immediately translate to what I do, but I recently came across ideas from several research projects that are worth sharing with a wider audience.


Model evaluation, model selection, and algorithm selection in machine learning

#artificialintelligence

In contrast to k-nearest neighbors, a simple example of a parametric method would be logistic regression, a generalized linear model with a fixed number of model parameters: a weight coefficient for each feature variable in the dataset plus a bias (or intercept) unit. While the learning algorithm optimizes an objective function on the training set (with exception to lazy learners), hyperparameter optimization is yet another task on top of it; here, we typically want to optimize a performance metric such as classification accuracy or the area under a Receiver Operating Characteristic curve. Thinking back of our discussion about learning curves and pessimistic biases in Part II, we noted that a machine learning algorithm often benefits from more labeled data; the smaller the dataset, the higher the pessimistic bias and the variance -- the sensitivity of our model towards the way we partition the data. We start by splitting our dataset into three parts, a training set for model fitting, a validation set for model selection, and a test set for the final evaluation of the selected model.


Model evaluation, model selection, and algorithm selection in machine learning

#artificialintelligence

Almost every machine learning algorithm comes with a large number of settings that we, the machine learning researchers and practitioners, need to specify. These tuning knobs, the so-called hyperparameters, help us control the behavior of machine learning algorithms when optimizing for performance, finding the right balance between bias and variance. Hyperparameter tuning for performance optimization is an art in itself, and there are no hard-and-fast rules that guarantee best performance on a given dataset. In Part I and Part II, we saw different holdout and bootstrap techniques for estimating the generalization performance of a model. We learned about the bias-variance trade-off, and we computed the uncertainty of our estimates. In this third part, we will focus on different methods of cross-validation for model evaluation and model selection. We will use these cross-validation techniques to rank models from several hyperparameter configurations and estimate how well they generalize to independent datasets. Previously, we used the holdout method or different flavors of bootstrapping to estimate the generalization performance of our predictive models.


Model evaluation, model selection, and algorithm selection in machine learning

#artificialintelligence

In the previous article (Part I), we introduced the general ideas behind model evaluation in supervised machine learning. We discussed the holdout method, which helps us to deal with real world limitations such as limited access to new, labeled data for model evaluation. Using the holdout method, we split our dataset into two parts: A training and a test set. First, we provide the training data to a supervised learning algorithm. The learning algorithm builds a model from the training set of labeled observations.


Model evaluation, model selection, and algorithm selection in machine learning - Part II

#artificialintelligence

In the previous article (Part I), we introduced the general ideas behind model evaluation in supervised machine learning. We discussed the holdout method, which helps us to deal with real world limitations such as limited access to new, labeled data for model evaluation. Using the holdout method, we split our dataset into two parts: A training and a test set. First, we provide the training data to a supervised learning algorithm. The learning algorithm builds a model from the training set of labeled observations.


Applying Supervised Learning Algorithms and a New Feature Selection Method to Predict Coronary Artery Disease

arXiv.org Machine Learning

From a fresh data science perspective, this thesis discusses the prediction of coronary artery disease based on genetic variations at the DNA base pair level, called Single-Nucleotide Polymorphisms (SNPs), collected from the Ontario Heart Genomics Study (OHGS). First, the thesis explains two commonly used supervised learning algorithms, the k-Nearest Neighbour (k-NN) and Random Forest classifiers, and includes a complete proof that the k-NN classifier is universally consistent in any finite dimensional normed vector space. Second, the thesis introduces two dimensionality reduction steps, Random Projections, a known feature extraction technique based on the Johnson-Lindenstrauss lemma, and a new method termed Mass Transportation Distance (MTD) Feature Selection for discrete domains. Then, this thesis compares the performance of Random Projections with the k-NN classifier against MTD Feature Selection and Random Forest, for predicting artery disease based on accuracy, the F-Measure, and area under the Receiver Operating Characteristic (ROC) curve. The comparative results demonstrate that MTD Feature Selection with Random Forest is vastly superior to Random Projections and k-NN. The Random Forest classifier is able to obtain an accuracy of 0.6660 and an area under the ROC curve of 0.8562 on the OHGS genetic dataset, when 3335 SNPs are selected by MTD Feature Selection for classification. This area is considerably better than the previous high score of 0.608 obtained by Davies et al. in 2010 on the same dataset.