Lapedes, Alan
Neural Network Definitions of Highly Predictable Protein Secondary Structure Classes
Lapedes, Alan, Steeg, Evan, Farber, Robert
We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable from local amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecular biology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes. Accuracy has been disappointingly low. The algorithm presented here uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alpha helix, beta strand and coil.
Use of Bad Training Data for Better Predictions
Grossman, Tal, Lapedes, Alan
We show how randomly scrambling the output classes of various fractions of the training data may be used to improve predictive accuracy of a classification algorithm. We present a method for calculating the "noise sensitivity signature" of a learning algorithm which is based on scrambling the output classes. This signature can be used to indicate a good match between the complexity of the classifier and the complexity of the data. Use of noise sensitivity signatures is distinctly different from other schemes to avoid overtraining, such as cross-validation, which uses only part of the training data, or various penalty functions, which are not data-adaptive. Noise sensitivity signature methods use all of the training data and are manifestly data-adaptive and nonparametric. They are well suited for situations with limited training data. 1 INTRODUCTION A major problem of pattern recognition and classification algorithms that learn from a training set of examples is to select the complexity of the model to be trained. How is it possible to avoid an overparameterized algorithm from "memorizing" the training data?
Neural Network Definitions of Highly Predictable Protein Secondary Structure Classes
Lapedes, Alan, Steeg, Evan, Farber, Robert
Alan Lapedes Complex Systems Group (TI3) LANL, MS B213 Los Alamos N.M. 87545 and The Santa Fe Institute, Santa Fe, New Mexico Evan Steeg Department of Computer Science University of Toronto, Toronto, Canada Robert Farber Complex Systems Group (TI3) LANL, MS B213 Los Alamos N.M. 87545 Abstract We use two co-evolving neural networks to determine new classes of protein secondary structure which are significantly more predictable fromlocal amino sequence than the conventional secondary structure classification. Accurate prediction of the conventional secondary structure classes: alpha helix, beta strand, and coil, from primary sequence has long been an important problem in computational molecularbiology. Neural networks have been a popular method to attempt to predict these conventional secondary structure classes.Accuracy has been disappointingly low. The algorithm presentedhere uses neural networks to similtaneously examine both sequence and structure data, and to evolve new classes of secondary structure that can be predicted from sequence with significantly higher accuracy than the conventional classes. These new classes have both similarities to, and differences with the conventional alphahelix, beta strand and coil.
Use of Bad Training Data for Better Predictions
Grossman, Tal, Lapedes, Alan
We show how randomly scrambling the output classes of various fractions of the training data may be used to improve predictive accuracy of a classification algorithm. We present a method for calculating the "noise sensitivity signature" of a learning algorithm which is based on scrambling the output classes. This signature can be used to indicate a good match between the complexity of the classifier and the complexity of the data. Use of noise sensitivity signatures is distinctly different from other schemes to avoid overtraining, suchas cross-validation, which uses only part of the training data, or various penalty functions, which are not data-adaptive. Noise sensitivity signature methods use all of the training data and are manifestly data-adaptive and nonparametric.