AITopics

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, even if we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order to obtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error. Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

artificial intelligence, neural network, stopping, (19 more...)

Country:

Europe > Germany (0.29)
Asia > Japan > Honshū > Kantō (0.15)
North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.65)

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Kearns, Michael J.

We work in a setting in which we must choose the right number of parameters for a hypothesis function in response to a finite training sample, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emphasizes asymptotic statistical properties, or the exact calculation of the generalization error for simple models. Our approach here is somewhat different, and is pri mari I y inspired by two sources. The first is the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (in their case, the Minimum Description Length Principle) in terms of a quantity known as the index of resolvability. The second is the work of Vapnik [5], who provided extremely powerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation performance. In the first and more formal part of the paper, we give a rigorous bound on the error of cross validation in terms of two parameters of the underlying model selection problem: the approximation rate and the estimation rate. In the second and more experimental part of the paper, we investigate the implications of our bound for choosing'Y, the fraction of data withheld for testing in cross validation. The most interesting aspect of this analysis is the identification of several qualitative properties of the optimal'Y that appear to be invariant over a wide class of model selection problems: - When the target function complexity is small compared to the sample size, the performance of cross validation is relatively insensitive to the choice of'Y.

artificial intelligence, cross validation, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split

Kearns, Michael J.

We work in a setting in which we must choose the right number of parameters for a hypothesis function in response to a finite training sample, with the goal of minimizing the resulting generalization error. There is a large and interesting literature on cross validation methods, which often emphasizes asymptotic statistical properties, or the exact calculation of the generalization error for simple models. Our approach here is somewhat different, and is primariIy inspired by two sources. The first is the work of Barron and Cover [2], who introduced the idea of bounding the error of a model selection method (in their case, the Minimum Description Length Principle) in terms of a quantity known as the index of resolvability. The second is the work of Vapnik [5], who provided extremely powerful and general tools for uniformly bounding the deviations between training and generalization errors. We combine these methods to give a new and general analysis of cross validation performance. Inthe first and more formal part of the paper, we give a rigorous bound on the error of cross validation in terms of two parameters of the underlying model selection problem: the approximation rate and the estimation rate. In the second and more experimental part of the paper, we investigate the implications of our bound for choosing'Y, the fraction of data withheld for testing in cross validation. The most interesting aspect of this analysis is the identification of several qualitative properties of the optimal'Y that appear to be invariant over a wide class of model selection problems: - When the target function complexity is small compared to the sample size, the performance of cross validation is relatively insensitive to the choice of'Y.

artificial intelligence, generalization error, machine learning, (16 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Amari, Shun-ichi, Murata, Noboru, Müller, Klaus-Robert, Finke, Michael, Yang, Howard Hua

Statistical Theory of Overtraining - Is Cross-Validation Asymptotically Effective?

A statistical theory for overtraining is proposed. The analysis treats realizable stochastic neural networks, trained with Kullback Leibler loss in the asymptotic case. It is shown that the asymptotic gain in the generalization error is small if we perform early stopping, evenif we have access to the optimal stopping time. Considering cross-validation stopping we answer the question: In what ratio the examples should be divided into training and testing sets in order toobtain the optimum performance. In the non-asymptotic region cross-validated early stopping always decreases the generalization error.Our large scale simulations done on a CM5 are in nice agreement with our analytical findings.

artificial intelligence, neural network, stopping, (19 more...)

Country:

Europe > Germany (0.29)
Asia > Japan > Honshū > Kantō (0.15)
North America > United States > Illinois (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.65)

Krogh, Anders, Vedelsby, Jesper

Neural Network Ensembles, Cross Validation, and Active Learning

Neural Information Processing SystemsDec-31-1995

It is well known that a combination of many different predictors can improve predictions. In the neural networks community "ensembles" of neural networks has been investigated by several authors, see for instance [1, 2, 3]. Most often the networks in the ensemble are trained individually and then their predictions are combined. This combination is usually done by majority (in classification) or by simple averaging (in regression), but one can also use a weighted combination of the networks.

artificial intelligence, generalization error, neural network, (16 more...)

Country:

Europe > Denmark (0.29)
North America > United States > California > San Mateo County > San Mateo (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.45)

Krogh, Anders, Vedelsby, Jesper

Neural Network Ensembles, Cross Validation, and Active Learning

Neural Information Processing SystemsDec-31-1995

It is well known that a combination of many different predictors can improve predictions. Inthe neural networks community "ensembles" of neural networks has been investigated by several authors, see for instance [1, 2, 3]. Most often the networks in the ensemble are trained individually and then their predictions are combined. This combination is usually done by majority (in classification) or by simple averaging (inregression), but one can also use a weighted combination of the networks.

artificial intelligence, generalization error, neural network, (16 more...)

Country:

Europe > Denmark (0.29)
North America > United States > California > San Mateo County > San Mateo (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.45)

Plutowski, Mark, Sakata, Shinichi, White, Halbert

Cross-Validation Estimates IMSE

Neural Information Processing SystemsDec-31-1994

Integrated Mean Squared Error (IMSE) is a version of the usual mean squared error criterion, averaged over all possible training sets of a given size. If it could be observed, it could be used to determine optimal network complexity or optimal data subsets forefficient training. We show that two common methods of cross-validating average squared error deliver unbiased estimates of IMSE, converging to IMSE with probability one. We also show that two variants of cross validation measure provide unbiased IMSE-based estimates potentially useful for selecting optimal data subsets. 1 Summary To begin, assume we are given a fixed network architecture. Let zN denote a given set of N training examples.

artificial intelligence, assumption 1, inductive learning, (17 more...)

Country: North America > United States > California (0.29)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.70)

Plutowski, Mark, Sakata, Shinichi, White, Halbert

Cross-Validation Estimates IMSE

Neural Information Processing SystemsDec-31-1994

Integrated Mean Squared Error (IMSE) is a version of the usual mean squared error criterion, averaged over all possible training sets of a given size. If it could be observed, it could be used to determine optimal network complexity or optimal data subsets for efficient training. We show that two common methods of cross-validating average squared error deliver unbiased estimates of IMSE, converging to IMSE with probability one. These estimates thus make possible approximate IMSE-based choice of network complexity. We also show that two variants of cross validation measure provide unbiased IMSE-based estimates potentially useful for selecting optimal data subsets. 1 Summary To begin, assume we are given a fixed network architecture.

artificial intelligence, assumption 1, machine learning, (17 more...)

Country: North America > United States > California (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (0.71)

Neural Information Processing SystemsDec-31-1993

Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method

Liu, Yong

Two theorems and a lemma are presented about the use of jackknife estimator andthe cross-validation method for model selection. Theorem 1 gives the asymptotic form for the jackknife estimator. Combined with the model selection criterion, this asymptotic form can be used to obtain the fit of a model. The model selection criterion we used is the negative of the average predictive likehood, the choice of which is based on the idea of the cross-validation method. Lemma 1 provides a formula for further exploration ofthe asymptotics of the model selection criterion. Theorem 2 gives an asymptotic form of the model selection criterion for the regression case, when the parameters optimization criterion has a penalty term. Theorem 2 also proves the asymptotic equivalence of Moody's model selection criterion (Moody,1992) and the cross-validation method, when the distance measure between response y and regression function takes the form of a squared difference. 1 INTRODUCTION Selecting a model for a specified problem is the key to generalization based on the training data set.

artificial intelligence, criterion, machine learning, (13 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)

Neural Information Processing SystemsDec-31-1993

Neural Network Model Selection Using Asymptotic Jackknife Estimator and Cross-Validation Method

Liu, Yong

Two theorems and a lemma are presented about the use of jackknife estimator and the cross-validation method for model selection. Theorem 1 gives the asymptotic form for the jackknife estimator. Combined with the model selection criterion, this asymptotic form can be used to obtain the fit of a model. The model selection criterion we used is the negative of the average predictive likehood, the choice of which is based on the idea of the cross-validation method. Lemma 1 provides a formula for further exploration of the asymptotics of the model selection criterion. Theorem 2 gives an asymptotic form of the model selection criterion for the regression case, when the parameters optimization criterion has a penalty term. Theorem 2 also proves the asymptotic equivalence of Moody's model selection criterion (Moody, 1992) and the cross-validation method, when the distance measure between response y and regression function takes the form of a squared difference. 1 INTRODUCTION Selecting a model for a specified problem is the key to generalization based on the training data set.

artificial intelligence, criterion, machine learning, (13 more...)

Country: North America > United States (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)