AITopics

1612.04717

Genre: Research Report > New Finding (0.67)

Industry:

Health & Medicine (0.67)
Telecommunications > Networks (0.34)
Information Technology > Networks (0.34)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.82)

#artificialintelligenceSep-7-2017, 05:15:14 GMT

14 Great Articles About Cross-Validation, Model Fitting and Selection

Cross-validation is a technique used to assess the accuracy of a predictive model, based on training set data. It splits the training sets into test and control sets. The test sets are used to fine-tune the model to increase performance (better classification rate or reduced errors in prediction) and the control sets are used to simulate how the model would perform outside the training set. The control and test sets must be carefully chosen for this method to make sense.

artificial intelligence, machine learning, model fitting and selection, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.72)

@machinelearnbotSep-5-2017, 20:40:15 GMT

Visualizing Cross-validation Code

Let's visualize to improve your prediction... Let us say, you are writing a nice and clean Machine Learning code (e.g. Your prediction could be slightly under or overfit, like the figures below. As the name of the suggests, cross-validation is the next fun thing after learning Linear Regression because it helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Everything is explained below with Code.

artificial intelligence, linear regression, machine learning, (7 more...)

Industry: Education > Curriculum > Subject-Specific Education (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

@machinelearnbotSep-4-2017, 05:10:10 GMT

Cross-Validation: Concept and Example in R

In Machine Learning, Cross-validation is a resampling method used for model evaluation to avoid testing a model on the same dataset on which it was trained. This is a common mistake, especially that a separate testing dataset is not always available. However, this usually leads to inaccurate performance measures (as the model will have an almost perfect score since it is being tested on the same data it was trained on). To avoid this kind of mistakes, cross validation is usually preferred.

artificial intelligence, concept and example, machine learning, (2 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

@machinelearnbotAug-31-2017, 15:00:09 GMT

Cross- Validation Code Visualization: Kind of Fun – Towards Data Science – Medium

As the name of the suggests, cross-validation is the next fun thing after learning Linear Regression because it helps to improve your prediction using the K-Fold strategy. What is K-Fold you asked? Everything is explained below with Code. We are copying the target in dataset to y variable. To see the dataset uncomment the print line.

artificial intelligence, cross-validation code visualization, machine learning, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)

@machinelearnbotAug-11-2017, 22:45:16 GMT

Making Predictive Models Robust: Holdout vs Cross-Validation

When evaluating machine learning models, the validation step helps you find the best parameters for your model while also preventing it from becoming overfitted. Two of the most popular strategies to perform the validation step are the hold-out strategy and the k-fold strategy. Pros of the hold-out strategy: Fully independent data; only needs to be run once so has lower computational costs. Cons of the hold-out strategy: Performance evaluation is subject to higher variance given the smaller size of the data. K-fold validation evaluates the data across the entire training set, but it does so by dividing the training set into K folds – or subsections – (where K is a positive integer) and then training the model K times, each time leaving a different fold out of the training data and using it instead as a validation set.

artificial intelligence, machine learning, training set, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.40)

Xu, Ning, Hong, Jian, Fisher, Timothy C. G.

$\left( \beta, \varpi \right)$-stability for cross-validation and the choice of the number of folds

arXiv.org Machine LearningJul-5-2017

In this paper, we introduce a new concept of stability for cross-validation, called the $\left( \beta, \varpi \right)$-stability, and use it as a new perspective to build the general theory for cross-validation. The $\left( \beta, \varpi \right)$-stability mathematically connects the generalization ability and the stability of the cross-validated model via the Rademacher complexity. Our result reveals mathematically the effect of cross-validation from two sides: on one hand, cross-validation picks the model with the best empirical generalization ability by validating all the alternatives on test sets; on the other hand, cross-validation may compromise the stability of the model selection by causing subsampling error. Moreover, the difference between training and test errors in q\textsuperscript{th} round, sometimes referred to as the generalization error, might be autocorrelated on q. Guided by the ideas above, the $\left( \beta, \varpi \right)$-stability help us derivd a new class of Rademacher bounds, referred to as the one-round/convoluted Rademacher bounds, for the stability of cross-validation in both the i.i.d.\ and non-i.i.d.\ cases. For both light-tail and heavy-tail losses, the new bounds quantify the stability of the one-round/average test error of the cross-validated model in terms of its one-round/average training error, the sample sizes $n$, number of folds $K$, the tail property of the loss (encoded as Orlicz-$\Psi_\nu$ norms) and the Rademacher complexity of the model class $\Lambda$. The new class of bounds not only quantitatively reveals the stability of the generalization ability of the cross-validated model, it also shows empirically the optimal choice for number of folds $K$, at which the upper bound of the one-round/average test error is lowest, or, to put it in another way, where the test error is most stable.

artificial intelligence, machine learning, test error, (17 more...)

1705.07349

Genre: Research Report (0.70)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

arXiv.org Machine LearningJun-23-2017

Cross-validation failure: small sample sizes lead to large error bars

Varoquaux, Gaël

Predictive models ground many state-of-the-art developments in statistical brain image analysis: decoding, MVPA, searchlight, or extraction of biomarkers. The principled approach to establish their validity and usefulness is cross-validation, testing prediction on unseen data. Here, I would like to raise awareness on error bars of cross-validation, which are often underestimated. Simple experiments show that sample sizes of many neuroimaging studies inherently lead to large error bars, eg $\pm$10% for 100 samples. The standard error across folds strongly underestimates them. These large error bars compromise the reliability of conclusions drawn with predictive models, such as biomarkers or methods developments where, unlike with cognitive neuroimaging MVPA approaches, more samples cannot be acquired by repeating the experiment across many subjects. Solutions to increase sample size must be investigated, tackling possible increases in heterogeneity of the data.

health & medicine, modeling & simulation, neurology, (20 more...)

1706.07581

Country:

Europe > France (0.14)
North America > United States (0.14)

Genre: Research Report > Experimental Study (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine (1.00)

Technology:

Information Technology > Modeling & Simulation (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.88)

Abou-Moustafa, Karim, Szepesvari, Csaba

An a Priori Exponential Tail Bound for k-Folds Cross-Validation

arXiv.org Machine LearningJun-19-2017

We consider a priori generalization bounds developed in terms of cross-validation estimates and the stability of learners. In particular, we first derive an exponential Efron-Stein type tail inequality for the concentration of a general function of n independent random variables. Next, under some reasonable notion of stability, we use this exponential tail bound to analyze the concentration of the k-fold cross-validation (KFCV) estimate around the true risk of a hypothesis generated by a general learning rule. While the accumulated literature has often attributed this concentration to the bias and variance of the estimator, our bound attributes this concentration to the stability of the learning rule and the number of folds k. This insight raises valid concerns related to the practical use of KFCV and suggests research directions to obtain reliable empirical estimates of the actual risk.

artificial intelligence, inequality, machine learning, (17 more...)

1706.05801

Country:

North America > Canada > Alberta (0.28)
North America > United States (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.81)

Bottegal, Giulio, Pillonetto, Gianluigi

The Generalized Cross Validation Filter

arXiv.org Machine LearningJun-8-2017

Generalized cross validation (GCV) is one of the most important approaches used to estimate parameters in the context of inverse problems and regularization techniques. A notable example is the determination of the smoothness parameter in splines. When the data are generated by a state space model, like in the spline case, efficient algorithms are available to evaluate the GCV score with complexity that scales linearly in the data set size. However, these methods are not amenable to on-line applications since they rely on forward and backward recursions. Hence, if the objective has been evaluated at time $t-1$ and new data arrive at time t, then O(t) operations are needed to update the GCV score. In this paper we instead show that the update cost is $O(1)$, thus paving the way to the on-line use of GCV. This result is obtained by deriving the novel GCV filter which extends the classical Kalman filter equations to efficiently propagate the GCV score over time. We also illustrate applications of the new filter in the context of state estimation and on-line regularized linear system identification.

artificial intelligence, gcv filter, machine learning, (16 more...)

1706.02495

Country: Europe > Netherlands (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.61)