Goto

Collaborating Authors

 k-fold cv


RandALO: Out-of-sample risk estimation in no time flat

arXiv.org Machine Learning

Training machine learning models is an often expensive process, especially in large data settings. Not only is there significant cost in the fitting of individual models, but even more importantly, the best model must be chosen from a set of candidates parameterized by a set of "hyperparameters" indexing the models, and each of these models must be fitted and evaluated in order to make the optimal selection. As a result, model selection, also called hyperparameter tuning, tends to be the most computationally expensive part of the machine learning pipeline. In order to evaluate models, we typically need to set aside unseen "holdout" data to estimate the risk of the model on new samples from the training distribution. When we have an abundance of training samples, such as in the millions or billions, we can afford to set aside a modest holdout set of tens of thousands of examples without compromising model performance.


Empirical investigation of multi-source cross-validation in clinical machine learning

arXiv.org Machine Learning

Traditionally, machine learning-based clinical prediction models have been trained and evaluated on patient data from a single source, such as a hospital. Cross-validation methods can be used to estimate the accuracy of such models on new patients originating from the same source, by repeated random splitting of the data. However, such estimates tend to be highly overoptimistic when compared to accuracy obtained from deploying models to sources not represented in the dataset, such as a new hospital. The increasing availability of multi-source medical datasets provides new opportunities for obtaining more comprehensive and realistic evaluations of expected accuracy through source-level cross-validation designs. In this study, we present a systematic empirical evaluation of standard K-fold cross-validation and leave-source-out cross-validation methods in a multi-source setting. We consider the task of electrocardiogram based cardiovascular disease classification, combining and harmonizing the openly available PhysioNet CinC Challenge 2021 and the Shandong Provincial Hospital datasets for our study. Our results show that K-fold cross-validation, both on single-source and multi-source data, systemically overestimates prediction performance when the end goal is to generalize to new sources. Leave-source-out cross-validation provides more reliable performance estimates, having close to zero bias though larger variability. The evaluation highlights the dangers of obtaining misleading cross-validation results on medical data and demonstrates how these issues can be mitigated when having access to multi-source data.


Is K-fold cross validation the best model selection method for Machine Learning?

arXiv.org Artificial Intelligence

As a technique that can compactly represent complex patterns, machine learning has significant potential for predictive inference. K-fold cross-validation (CV) is the most common approach to ascertaining the likelihood that a machine learning outcome is generated by chance and frequently outperforms conventional hypothesis testing. This improvement uses measures directly obtained from machine learning classifications, such as accuracy, that do not have a parametric description. To approach a frequentist analysis within machine learning pipelines, a permutation test or simple statistics from data partitions (i.e. folds) can be added to estimate confidence intervals. Unfortunately, neither parametric nor non-parametric tests solve the inherent problems around partitioning small sample-size datasets and learning from heterogeneous data sources. The fact that machine learning strongly depends on the learning parameters and the distribution of data across folds recapitulates familiar difficulties around excess false positives and replication. The origins of this problem are demonstrated by simulating common experimental circumstances, including small sample sizes, low numbers of predictors, and heterogeneous data sources. A novel statistical test based on K-fold CV and the Upper Bound of the actual error (K-fold CUBV) is composed, where uncertain predictions of machine learning with CV are bounded by the \emph{worst case} through the evaluation of concentration inequalities. Probably Approximately Correct-Bayesian upper bounds for linear classifiers in combination with K-fold CV is used to estimate the empirical error. The performance with neuroimaging datasets suggests this is a robust criterion for detecting effects, validating accuracy values obtained from machine learning whilst avoiding excess false positives.


Extrapolated cross-validation for randomized ensembles

arXiv.org Machine Learning

Ensemble methods such as bagging and random forests are ubiquitous in various fields, from finance to genomics. Despite their prevalence, the question of the efficient tuning of ensemble parameters has received relatively little attention. This paper introduces a cross-validation method, ECV (Extrapolated Cross-Validation), for tuning the ensemble and subsample sizes in randomized ensembles. Our method builds on two primary ingredients: initial estimators for small ensemble sizes using out-of-bag errors and a novel risk extrapolation technique that leverages the structure of prediction risk decomposition. By establishing uniform consistency of our risk extrapolation technique over ensemble and subsample sizes, we show that ECV yields $\delta$-optimal (with respect to the oracle-tuned risk) ensembles for squared prediction risk. Our theory accommodates general ensemble predictors, only requires mild moment assumptions, and allows for high-dimensional regimes where the feature dimension grows with the sample size. As a practical case study, we employ ECV to predict surface protein abundances from gene expressions in single-cell multiomics using random forests. In comparison to sample-split cross-validation and $K$-fold cross-validation, ECV achieves higher accuracy avoiding sample splitting. At the same time, its computational cost is considerably lower owing to the use of the risk extrapolation technique. Additional numerical results validate the finite-sample accuracy of ECV for several common ensemble predictors under a computational constraint on the maximum ensemble size.


Uncertainty in Fairness Assessment: Maintaining Stable Conclusions Despite Fluctuations

arXiv.org Artificial Intelligence

With the current adoption of machine learning (ML) systems in social, economic, and industrial domains, concerns about the fairness of automated decisions have been added to the problem of ensuring the efficiency of algorithms in a stable and interpretative manner. Although both aspects are measured in terms of performance metrics, fairness entails the additional challenge of incorporating sensitive information in the data and new procedures need to be considered to control the stability of such outcomes. Recent ML trends are increasingly encouraging researchers to incorporate uncertainty into the evaluation of algorithm-based systems. In order to increase the transparency of algorithmic performance measures, typically for comparison purposes, some authors [3, 19] propose to treat these metrics as random variables whose posterior distributions are updated through Bayesian inference. In the fair learning setting, these kinds of considerations are also necessary, especially since fairness metrics have been proved unstable with respect to dataset composition. In particular, Ji et al. [17] or Friedler et al. [12] showed how certain fairness metrics strongly vary, respectively, in hold-out


Random_Forest_Medium_Article

#artificialintelligence

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. Machine Learning Algorithms have historically been used in the Credit and Fraud Space.


Evaluating a Binary Classifier

#artificialintelligence

The following discusses using cross-validation to evaluate the classifier we built in the previous post, which classifies images from the MNIST dataset as either five or not five. Let's take a brief look at the problem that cross-validation solves. When building a model, we risk overfitting the model on the test set when evaluating different hyperparameters. This is because we can tweak the hyperparameters until the model performs optimally. In overfitting, knowledge about the test set "leaks" into the model, and evaluation metrics no longer report on generalization.


Cross-Validation in Machine Learning: How to Do It Right - neptune.ai

#artificialintelligence

In machine learning (ML), generalization usually refers to the ability of an algorithm to be effective across various inputs. It means that the ML model does not encounter performance degradation on the new inputs from the same distribution of the training data. For human beings generalization is the most natural thing possible. We can classify on the fly. For example, we would definitely recognize a dog even if we didn't see this breed before. Nevertheless, it might be quite a challenge for an ML model.


Bias & Variance in Machine Learning

#artificialintelligence

Linear Regression is a machine learning algorithm that is used to predict a quantitative target, with the help of independent variables that are modeled in a linear manner, to fit a line or a plane (or hyperplane) that contains the predicted data points. For a second, let's consider this to be the best-fit line (for better understanding). So, usually, points from the training data don't really lie on the best-fit line only, and that makes perfect sense because any data isn't perfect. That is why we are making predictions in the first place, and not just plotting a random line. The linear regression line cannot be curved in order to include all the training set data points, and hence is unable to capture an accurate relationship at times.


Bias & Variance in Machine Learning

#artificialintelligence

Linear Regression is a machine learning algorithm that is used to predict a quantitative target, with the help of independent variables that are modeled in a linear manner, to fit a line or a plane (or hyperplane) that contains the predicted data points. For a second, let's consider this to be the best-fit line (for better understanding). So, usually, points from the training data don't really lie on the best-fit line only, and that makes perfect sense because any data isn't perfect. That is why we are making predictions in the first place, and not just plotting a random line. The linear regression line cannot be curved in order to include all the training set data points, and hence is unable to capture an accurate relationship at times.