Goto

Collaborating Authors

 Cross Validation


Evaluation of Machine Learning Models with Scikit-learn: Metrics and Cross-Validation – vegibit

#artificialintelligence

In machine learning, model evaluation is the process of evaluating the performance of a model on a given dataset. It is an essential step in the machine learning pipeline as it helps to determine the effectiveness of a model and identify areas for improvement. Model evaluation can be performed using various metrics, such as accuracy, precision, recall, and F1 score, which provide different insights into the model's performance. Additionally, techniques such as cross-validation can be used to assess the generalization performance of a model and prevent overfitting. This article will explore metrics and cross-validation for evaluating machine learning models with the scikit-learn library.


CVTT: Cross-Validation Through Time

arXiv.org Artificial Intelligence

The evaluation of recommender systems from a practical perspective is a topic of ongoing discourse within the research community. While many current evaluation methods reduce performance to a single value metric as an easy way to compare models, it relies on the assumption that the methods' performance remains constant over time. In this study, we examine this assumption and propose the Cross-Validation Thought Time (CVTT) technique as a more comprehensive evaluation method, focusing on model performance over time. By utilizing the proposed technique, we conduct an in-depth analysis of the performance of popular RecSys algorithms. Our findings indicate that (1) the performance of the recommenders varies over time for all reviewed datasets, (2) using simple evaluation approaches can lead to a substantial decrease in performance in real-world evaluation scenarios, and (3) excessive data usage can lead to suboptimal results.


Benchmarking Machine Learning Models with Cross-Validation and Matplotlib in Python

#artificialintelligence

In this article, we will look at how to use Python to compare and evaluate the performance of machine learning models. We will use cross-validation with Sklearn to test the models and Matplotlib to display the results. The main motivation for doing this is to have a clear and accurate understanding of model performance and thus improve the model selection process. Cross-validation is a robust method for testing models on data other than training data. It allows us to evaluate model performance on folds, data that has not been used to train the model itself, which gives us a more accurate estimate of model performance on real data.


Cross Validation. Cross-validation is a technique for…

#artificialintelligence

Cross-validation is a technique for evaluating a machine learning model and testing its performance. Cross-validation is a technique used to evaluate the performance of a machine learning model by training it on different subsets of the data and testing it on the remaining subset. Cross-validation is also known as rotation estimation or out-of-sample testing. Rotation estimation refers to the process of rotating, or splitting, the data into different subsets. Simply put, in the process of cross-validation, the original data sample is randomly divided into several subsets.


Toward Theoretical Guidance for Two Common Questions in Practical Cross-Validation based Hyperparameter Selection

arXiv.org Artificial Intelligence

We show, to our knowledge, the first theoretical treatments of two common questions in cross-validation based hyperparameter selection: (1) After selecting the best hyperparameter using a held-out set, we train the final model using {\em all} of the training data -- since this may or may not improve future generalization error, should one do this? (2) During optimization such as via SGD (stochastic gradient descent), we must set the optimization tolerance $\rho$ -- since it trades off predictive accuracy with computation cost, how should one set it? Toward these problems, we introduce the {\em hold-in risk} (the error due to not using the whole training data), and the {\em model class mis-specification risk} (the error due to having chosen the wrong model class) in a theoretical view which is simple, general, and suggests heuristics that can be used when faced with a dataset instance. In proof-of-concept studies in synthetic data where theoretical quantities can be controlled, we show that these heuristics can, respectively, (1) always perform at least as well as always performing retraining or never performing retraining, (2) either improve performance or reduce computational overhead by $2\times$ with no loss in predictive performance.


Understanding Cross-Validation part2(Machine Learning)

#artificialintelligence

Abstract: We derive high-dimensional Gaussian comparison results for the standard V-fold cross-validated risk estimates. Our result combines a recent stability-based argument for the low-dimensional central limit theorem of cross-validation with the high-dimensional Gaussian comparison framework for sums of independent random variables. These results give new insights into the joint sampling distribution of cross-validated risks in the context of model comparison and tuning parameter selection, where the number of candidate models and tuning parameters can be larger than the fitting sample size. Abstract: In this article we prove that estimator stability is enough to show that leave-one-out cross validation is a sound procedure, by providing concentration bounds in a general framework. In particular, we provide concentration bounds beyond Lipschitz continuity assumptions on the loss or on the estimator.


Few-Shot Calibration of Set Predictors via Meta-Learned Cross-Validation-Based Conformal Prediction

arXiv.org Artificial Intelligence

Conventional frequentist learning is known to yield poorly calibrated models that fail to reliably quantify the uncertainty of their decisions. Bayesian learning can improve calibration, but formal guarantees apply only under restrictive assumptions about correct model specification. Conformal prediction (CP) offers a general framework for the design of set predictors with calibration guarantees that hold regardless of the underlying data generation mechanism. However, when training data are limited, CP tends to produce large, and hence uninformative, predicted sets. This paper introduces a novel meta-learning solution that aims at reducing the set prediction size. Unlike prior work, the proposed meta-learning scheme, referred to as meta-XB, (i) builds on cross-validation-based CP, rather than the less efficient validation-based CP; and (ii) preserves formal per-task calibration guarantees, rather than less stringent task-marginal guarantees. Finally, meta-XB is extended to adaptive non-conformal scores, which are shown empirically to further enhance marginal per-input calibration.


Combined Pruning for Nested Cross-Validation to Accelerate Automated Hyperparameter Optimization for Embedded Feature Selection in High-Dimensional Data with Very Small Sample Sizes

arXiv.org Artificial Intelligence

Background: Embedded feature selection in high-dimensional data with very small sample sizes requires optimized hyperparameters for the model building process. For this hyperparameter optimization, nested cross-validation must be applied to avoid a biased performance estimation. The resulting repeated training with high-dimensional data leads to very long computation times. Moreover, it is likely to observe a high variance in the individual performance evaluation metrics caused by outliers in tiny validation sets. Therefore, early stopping applying standard pruning algorithms to save time risks discarding promising hyperparameter sets. Result: To speed up feature selection for high-dimensional data with tiny sample size, we adapt the use of a state-of-the-art asynchronous successive halving pruner. In addition, we combine it with two complementary pruning strategies based on domain or prior knowledge. One pruning strategy immediately stops computing trials with semantically meaningless results for the selected hyperparameter combinations. The other is a new extrapolating threshold pruning strategy suitable for nested-cross-validation with a high variance of performance evaluation metrics. In repeated experiments, our combined pruning strategy keeps all promising trials. At the same time, the calculation time is substantially reduced compared to using a state-of-the-art asynchronous successive halving pruner alone. Up to 81.3\% fewer models were trained achieving the same optimization result. Conclusion: The proposed combined pruning strategy accelerates data analysis or enables deeper searches for hyperparameters within the same computation time. This leads to significant savings in time, money and energy consumption, opening the door to advanced, time-consuming analyses.


Cross-Validation in Machine Learning

#artificialintelligence

The model performance is based on dividing the known data into two parts, one to train the model and the other to test the prediction performance, thus obtaining the model accuracy and adjusting it according to the results. However, accuracy depends on how we slip the data, which can lead to possible biases in the model that prevent accuracy from generalizing to unseen data. Cross-validation is used to combat the random split of the data. This is a method that allows testing the performance of a predictive machine learning model, based on the same principle of the Train-Test split technique but with the difference that it must be performed k times and obtain the accuracy of each attempt. This technique is known as k-folds, where each fold is a specific division of the data different from the rest.


Random projections and Kernelised Leave One Cluster Out Cross-Validation: Universal baselines and evaluation tools for supervised machine learning for materials properties

arXiv.org Artificial Intelligence

With machine learning being a popular topic in current computational materials science literature, creating representations for compounds has become common place. These representations are rarely compared, as evaluating their performance - and the performance of the algorithms that they are used with - is non-trivial. With many materials datasets containing bias and skew caused by the research process, leave one cluster out cross validation (LOCO-CV) has been introduced as a way of measuring the performance of an algorithm in predicting previously unseen groups of materials. This raises the question of the impact, and control, of the range of cluster sizes on the LOCO-CV measurement outcomes. We present a thorough comparison between composition-based representations, and investigate how kernel approximation functions can be used to better separate data to enhance LOCO-CV applications. We find that domain knowledge does not improve machine learning performance in most tasks tested, with band gap prediction being the notable exception. We also find that the radial basis function improves the linear separability of chemical datasets in all 10 datasets tested and provide a framework for the application of this function in the LOCO-CV process to improve the outcome of LOCO-CV measurements regardless of machine learning algorithm, choice of metric, and choice of compound representation. We recommend kernelised LOCO-CV as a training paradigm for those looking to measure the extrapolatory power of an algorithm on materials data.