AITopics | Cross Validation

Choice of K in K-fold Cross Validation for Classification in Financial Market

@machinelearnbotJun-4-2017, 03:20:16 GMT

Cross Validation is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract However, one question often pops up: how to choose K in K-fold cross validation. The rule-of-thumb choice often suggested by literature based on non-financial market is K 10. The question is: is it true for Financial Market?

banking & finance, cross validation, financial market, (6 more...)

@machinelearnbot

Industry: Banking & Finance (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Cross-validation in R: a do-it-yourself and a black box approach

@machinelearnbotJun-1-2017, 01:30:25 GMT

In my previous post, we saw that R-squared can lead to a misleading interpretation of the quality of our regression fit, in terms of prediction power. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfittting. In this type of validation, one case in our data set is used as the test set, while the remaining cases are used as the training set. We iterate through the data set, until all cases have served as the test set.

air transportation, artificial intelligence, machine learning, (11 more...)

@machinelearnbot

Industry: Transportation > Air (0.43)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.68)

Add feedback

Day22: Splitting data, calculating metrics, cross-validations

#artificialintelligenceMar-19-2017, 06:35:11 GMT

Today I want to showcase "better" and more tidy ways of using Python. The Jupyter Notebook for this little project is found here. Originally, I would slice the data by the index. The problem is that the slicing is not random. Originally, I would manually compute accuracy.

artificial intelligence, machine learning, splitting data, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.59)

Add feedback

Cross-validation

Arlot, Sylvain

arXiv.org Machine LearningMar-9-2017

This text is a survey on cross-validation. We define all classical cross-validation procedures, and we study their properties for two different goals: estimating the risk of a given estimator, and selecting the best estimator among a given family. For the risk estimation problem, we compute the bias (which can also be corrected) and the variance of cross-validation methods. For estimator selection, we first provide a first-order analysis (based on expectations). Then, we explain how to take into account second-order terms (from variance computations, and by taking into account the usefulness of overpenalization). This allows, in the end, to provide some guidelines for choosing the best cross-validation method for a given learning problem.

artificial intelligence, machine learning, validation croisée, (16 more...)

arXiv.org Machine Learning

1703.03167

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Cross-Validation: Concept and Example in R

#artificialintelligenceMar-7-2017, 16:25:19 GMT

In Machine Learning, Cross-validation is a resampling method used for model evaluation to avoid testing a model on the same dataset on which it was trained. This is a common mistake, especially that a separate testing dataset is not always available. However, this usually leads to inaccurate performance measures (as the model will have an almost perfect score since it is being tested on the same data it was trained on). To avoid this kind of mistakes, cross validation is usually preferred. The concept of cross-validation is actually simple: Instead of using the whole dataset to train and then test on same data, we could randomly divide our data into training and testing datasets.

artificial intelligence, machine learning, threshold, (8 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Improving Efficiency of SVM k -Fold Cross-Validation by Alpha Seeding

Wen, Zeyi (The University of Melbourne) | Li, Bin (South China University of Technology) | Kotagiri, Ramamohanarao (The University of Melbourne) | Chen, Jian (South China University of Technology) | Chen, Yawen (South China University of Technology) | Zhang, Rui (The University of Melbourne)

AAAI ConferencesFeb-14-2017

The k-fold cross-validation is commonly used to evaluate the effectiveness of SVMs with the selected hyper-parameters. It is known that the SVM k-fold cross-validation is expensive, since it requires training k SVMs. However, little work has explored reusing the h-th SVM for training the (h+1)-th SVM for improving the efficiency of k-fold cross-validation. In this paper, we propose three algorithms that reuse the h-th SVM for improving the efficiency of training the (h+1)-th SVM. Our key idea is to efficiently identify the support vectors and to accurately estimate their associated weights (also called alpha values) of the next SVM by using the previous SVM. Our experimental results show that our algorithms are several times faster than the k-fold cross-validation which does not make use of the previously trained SVM. Moreover, our algorithms produce the same results (hence same accuracy) as the k-fold cross-validation which does not make use of the previously trained SVM.

alpha value, artificial intelligence, machine learning, (19 more...)

AAAI Conferences

Thirty-First AAAI Conference on Artificial Intelligence

Country:

Europe (0.28)
Asia > China (0.14)
Oceania > Australia (0.14)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.93)

Add feedback

Cross validation Deep learning

#artificialintelligenceFeb-11-2017, 01:35:12 GMT

It seems to me, that above definition of k-folded cross validation algorithm (from Deep Learning book by Ian Goodfellow and Yoshua Bengio and Aaron Courville, 2016) is inconsistent with the common definition of cross - validation. In above algorithm $e$ vector is the vector of loss function calculated for every particular example in the $D$ dataset, and then mean of vector $e$ is the estimation of generalization error. Whereas in standard definition of cross - validation, we calculate test error for each fold and then calculate average of them.

artificial intelligence, cross validation deep learning, machine learning, (2 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Assessing and tuning brain decoders: cross-validation, caveats, and guidelines

Varoquaux, Gaël, Raamana, Pradeep Reddy, Engemann, Denis, Hoyos-Idrobo, Andrés, Schwartz, Yannick, Thirion, Bertrand

arXiv.org Machine LearningNov-7-2016

Decoding, ie prediction from brain images or signals, calls for empirical evaluation of its predictive power. Such evaluation is achieved via cross-validation, a method also used to tune decoders' hyper-parameters. This paper is a review on cross-validation procedures for decoding in neuroimaging. It includes a didactic overview of the relevant theoretical considerations. Practical aspects are highlighted with an extensive empirical study of the common decoders in within-and across-subject predictions, on multiple datasets --anatomical and functional MRI and MEG-- and simulations. Theory and experiments outline that the popular " leave-one-out " strategy leads to unstable and biased estimates, and a repeated random splits method should be preferred. Experiments outline the large error bars of cross-validation in neuroimaging settings: typical confidence intervals of 10%. Nested cross-validation can tune decoders' parameters while avoiding circularity bias. However we find that it can be more favorable to use sane defaults, in particular for non-sparse decoders.

accuracy, neurology, survey article, (23 more...)

arXiv.org Machine Learning

doi: 10.1016/j.neuroimage.2016.10.038

1606.05201

Country:

North America > United States (0.28)
North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Cross-validation based Nonlinear Shrinkage

Bartz, Daniel

arXiv.org Machine LearningNov-2-2016

Many machine learning algorithms require precise estimates of covariance matrices. The sample covariance matrix performs poorly in high-dimensional settings, which has stimulated the development of alternative methods, the majority based on factor models and shrinkage. Recent work of Ledoit and Wolf has extended the shrinkage framework to Nonlinear Shrinkage (NLS), a more powerful covariance estimator based on Random Matrix Theory. Our contribution shows that, contrary to claims in the literature, cross-validation based covariance matrix estimation (CVC) yields comparable performance at strongly reduced complexity and runtime. On two real world data sets, we show that the CVC estimator yields superior results than competing shrinkage and factor based methods.

artificial intelligence, eigenvalue, optimization problem, (15 more...)

arXiv.org Machine Learning

1611.00798

Country: Europe > Spain (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Cross-validation in R: a do-it-yourself and a black box approach

@machinelearnbotOct-31-2016, 23:37:34 GMT

In my previous post, we saw that R-squared can lead to a misleading interpretation of the quality of our regression fit, in terms of prediction power. One thing that R-squared offers no protection against is overfitting. On the other hand, cross validation, by allowing us to have cases in our testing set that are different from the cases in our training set, inherently offers protection against overfittting. In this type of validation, one case in our data set is used as the test set, while the remaining cases are used as the training set. We iterate through the data set, until all cases have served as the test set.

air transportation, artificial intelligence, machine learning, (7 more...)

@machinelearnbot

Industry: Transportation > Air (0.42)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.72)

Add feedback