Terrence L. Fine School of Electrical Engineering Cornell University Ithaca, NY 14853 Abstract The three problems that concern us are identifying a natural domain of pattern classification applications of feed forward neural networks, selecting anappropriate feedforward network architecture, and assessing the tradeoff between network complexity, training set size, and statistical reliability asmeasured by the probability of incorrect classification. We close with some suggestions, for improving the bounds that come from Vapnik Chervonenkis theory, that can narrow, but not close, the chasm between theory and practice. Neural networks are appropriate as pattern classifiers when the pattern sources are ones of which we have little understanding, beyond perhaps a nonparametric statistical model, but we have been provided with classified samples of features drawn from each of the pattern categories. Neural networks should be able to provide rapid and reliable computation of complex decision functions. The issue in doubt is their statistical response to new inputs.
Feature selection and creation are two of the most important and difficult tasks in the field of pattern classification. Good features improve the performance of both conventional and neural network pattern classifiers. Exemplar selection is another task that can reduce the memory and computation requirements of a KNN classifier. These three tasks require a search through a space which is typically so large that 797 798 Chang and Lippmann exhaustive search is impractical. The purpose of this research was to explore the usefulness of Genetic search algorithms for these tasks.
We derive here new generalization bounds, based on Rademacher Complexity theory, for model selection and error estimation of linear (kernel) classifiers, which exploit the availability of unlabeled samples. In particular, two results are obtained: the first one shows that, using the unlabeled samples, the confidence term of the conventional bound can be reduced by a factor of three; the second one shows that the unlabeled samples can be used to obtain much tighter bounds, by building localized versions of the hypothesis class containing the optimal classifier. Papers published at the Neural Information Processing Systems Conference.