AITopics | Cross Validation

Collaborating Authors

Cross Validation

News Overviews Instructional Materials AI-Alerts Classics

Mass-Univariate Hypothesis Testing on MEEG Data using Cross-Validation

arXiv.org Machine LearningJun-25-2014

Recent advances in statistical theory, together with advances in the computational power of computers, provide alternative methods to do mass-univariate hypothesis testing in which a large number of univariate tests, can be properly used to compare MEEG data at a large number of time-frequency points and scalp locations. One of the major problematic aspects of this kind of mass-univariate analysis is due to high number of accomplished hypothesis tests. Hence procedures that remove or alleviate the increased probability of false discoveries are crucial for this type of analysis. Here, I propose a new method for mass-univariate analysis of MEEG data based on cross-validation scheme. In this method, I suggest a hierarchical classification procedure under k-fold cross-validation to detect which sensors at which time-bin and which frequency-bin contributes in discriminating between two different stimuli or tasks. To achieve this goal, a new feature extraction method based on the discrete cosine transform (DCT) employed to get maximum advantage of all three data dimensions. Employing cross-validation and hierarchy architecture alongside the DCT feature space makes this method more reliable and at the same time enough sensitive to detect the narrow effects in brain activities.

hypothesis testing, neurology, survey article, (19 more...)

arXiv.org Machine Learning

1406.672

Country: North America > United States > New York (0.14)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Health & Medicine > Therapeutic Area > Neurology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Futility Analysis in the Cross-Validation of Machine Learning Models

Kuhn, Max

arXiv.org Machine LearningMay-27-2014

Many machine learning models have important structural tuning parameters that cannot be directly estimated from the data. The common tactic for setting these parameters is to use resampling methods, such as cross--validation or the bootstrap, to evaluate a candidate set of values and choose the best based on some pre--defined criterion. Unfortunately, this process can be time consuming. However, the model tuning process can be streamlined by adaptively resampling candidate values so that settings that are clearly sub-optimal can be discarded. The notion of futility analysis is introduced in this context. An example is shown that illustrates how adaptive resampling can be used to reduce training time. Simulation studies are used to understand how the potential speed--up is affected by parallel processing techniques.

artificial intelligence, health & medicine, resample, (17 more...)

arXiv.org Machine Learning

1405.6974

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.68)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

Estimating the Maximum Expected Value: An Analysis of (Nested) Cross Validation and the Maximum Sample Average

van Hasselt, Hado

arXiv.org Machine LearningMar-1-2013

We investigate the accuracy of the two most common estimators for the maximum expected value of a general set of random variables: a generalization of the maximum sample average, and cross validation. No unbiased estimator exists and we show that it is non-trivial to select a good estimator without knowledge about the distributions of the random variables. We investigate and bound the bias and variance of the aforementioned estimators and prove consistency. The variance of cross validation can be significantly reduced, but not without risking a large bias. The bias and variance of different variants of cross validation are shown to be very problem-dependent, and a wrong choice can lead to very inaccurate estimates.

artificial intelligence, estimator, machine learning, (18 more...)

arXiv.org Machine Learning

1302.7175

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Energy > Oil & Gas (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Dhanjal, Charanpal, Baskiotis, Nicolas, Clémençon, Stéphan, Usunier, Nicolas

arXiv.org Machine LearningDec-8-2012

Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via V-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage however of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called V-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here we report on an extensive set of experiments comparing V-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and V-fold penalisation provide poor estimates of the risk respectively and introduce a modified penalisation technique to reduce the estimation error.

artificial intelligence, machine learning, penalisation, (17 more...)

arXiv.org Machine Learning

1212.178

Country: North America > United States > New York (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

Concentration inequalities of the cross-validation estimate for stable predictors

Cornec, Matthieu

arXiv.org Machine LearningNov-23-2010

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for stable predictors in the context of risk assessment. The notion of stability has been first introduced by \cite{DEWA79} and extended by \cite{KEA95}, \cite{BE01} and \cite{KUNIY02} to characterize class of predictors with infinite VC dimension. In particular, this covers $k$-nearest neighbors rules, bayesian algorithm (\cite{KEA95}), boosting,... General loss functions and class of predictors are considered. We use the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$-fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. In particular, we give a simple rule on how to choose the cross-validation, depending on the stability of the class of predictors. In the special case of uniform stability, an interesting consequence is that the number of elements in the test set is not required to grow to infinity for the consistency of the cross-validation procedure. In this special case, the particular interest of leave-one-out cross-validation is emphasized.

artificial intelligence, machine learning, stability, (17 more...)

arXiv.org Machine Learning

1011.5133

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Estimating Subagging by cross-validation

CORNEC, Matthieu

arXiv.org Machine LearningNov-23-2010

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for subagged estimators, both for classification and regressor. General loss functions and class of predictors with both finite and infinite VC-dimension are considered. We slightly generalize the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$-fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. \bigskip \noindent An interesting consequence is that the probability upper bound is bounded by the minimum of a Hoeffding-type bound and a Vapnik-type bounds, and thus is smaller than 1 even for small learning set. Finally, we give a simple rule on how to subbag the predictor. \bigskip

artificial intelligence, exp, health & medicine, (18 more...)

arXiv.org Machine Learning

1011.5142

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

Concentration inequalities of the cross-validation estimator for Empirical Risk Minimiser

Cornec, Matthieu

arXiv.org Machine LearningOct-30-2010

In this article, we derive concentration inequalities for the cross-validation estimate of the generalization error for empirical risk minimizers. In the general setting, we prove sanity-check bounds in the spirit of \cite{KR99} \textquotedblleft\textit{bounds showing that the worst-case error of this estimate is not much worse that of training error estimate} \textquotedblright . General loss functions and class of predictors with finite VC-dimension are considered. We closely follow the formalism introduced by \cite{DUD03} to cover a large variety of cross-validation procedures including leave-one-out cross-validation, $k$% -fold cross-validation, hold-out cross-validation (or split sample), and the leave-$\upsilon$-out cross-validation. In particular, we focus on proving the consistency of the various cross-validation procedures. We point out the interest of each cross-validation procedure in terms of rate of convergence. An estimation curve with transition phases depending on the cross-validation procedure and not only on the percentage of observations in the test sample gives a simple rule on how to choose the cross-validation. An interesting consequence is that the size of the test sample is not required to grow to infinity for the consistency of the cross-validation procedure.

artificial intelligence, health & medicine, test sample, (16 more...)

arXiv.org Machine Learning

1011.0096

Country: North America > United States > California (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Add feedback

V-fold cross-validation improved: V-fold penalization

Arlot, Sylvain

arXiv.org Machine LearningFeb-7-2008

We study the efficiency of V-fold cross-validation (VFCV) for model selection from the non-asymptotic viewpoint, and suggest an improvement on it, which we call ``V-fold penalization''. Considering a particular (though simple) regression problem, we prove that VFCV with a bounded V is suboptimal for model selection, because it ``overpenalizes'' all the more that V is large. Hence, asymptotic optimality requires V to go to infinity. However, when the signal-to-noise ratio is low, it appears that overpenalizing is necessary, so that the optimal V is not always the larger one, despite of the variability issue. This is confirmed by some simulated data. In order to improve on the prediction performance of VFCV, we define a new model selection procedure, called ``V-fold penalization'' (penVF). It is a V-fold subsampling version of Efron's bootstrap penalties, so that it has the same computational cost as VFCV, while being more flexible. In a heteroscedastic regression framework, assuming the models to have a particular structure, we prove that penVF satisfies a non-asymptotic oracle inequality with a leading constant that tends to 1 when the sample size goes to infinity. In particular, this implies adaptivity to the smoothness of the regression function, even with a highly heteroscedastic noise. Moreover, it is easy to overpenalize with penVF, independently from the V parameter. A simulation study shows that this results in a significant improvement on VFCV in non-asymptotic situations.

artificial intelligence, machine learning, vfcv, (19 more...)

arXiv.org Machine Learning

0802.0566

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.62)

Add feedback

Cross-Validation Optimization for Large Scale Hierarchical Classification Kernel Methods

Seeger, Matthias

Neural Information Processing SystemsDec-31-2007

In many real-world statistical problems, we would like to fit a model with a large number of dependent variables to a training sample with very many cases.

artificial intelligence, classification, hyperparameter, (14 more...)

Neural Information Processing Systems

Country: Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)

Genre: Research Report (0.46)

Technology: