Goto

Collaborating Authors

 Inductive Learning


Scaling and Generalization in Neural Networks: A Case Study

Neural Information Processing Systems

The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rigorous results. In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of.generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved.


What Size Net Gives Valid Generalization?

Neural Information Processing Systems

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size.


Linear Learning: Landscapes and Algorithms

Neural Information Processing Systems

What follows extends some of our results of [1] on learning from examples in layered feed-forward networks of linear units. In particular we examine what happens when the ntunber of layers is large or when the connectivity between layers is local and investigate some of the properties of an autoassociative algorithm. Notation will be as in [1] where additional motivations and references can be found. It is usual to criticize linear networks because "linear functions do not compute" and because several layers can always be reduced to one by the proper multiplication of matrices. However this is not the point of view adopted here.


What Size Net Gives Valid Generalization?

Neural Information Processing Systems

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probabilitydistribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size.


Training a 3-Node Neural Network is NP-Complete

Neural Information Processing Systems

We consider a 2-layer, 3-node, n-input neural network whose nodes compute linear threshold functions of their inputs. We show that it is NPcomplete to decide whether there exist weights and thresholds for the three nodes of this network so that it will produce output consistent witha given set of training examples. We extend the result to other simple networks. This result suggests that those looking for perfect training algorithms cannot escape inherent computational difficulties just by considering only simple or very regular networks. It also suggests the importance, given a training problem, of finding an appropriate network and input encoding for that problem. It is left as an open problem to extend our result to nodes with nonlinear functions such as sigmoids.


Scaling and Generalization in Neural Networks: A Case Study

Neural Information Processing Systems

The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rigorous results.In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separable functions.The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number ofuseful properties. We find that many aspects of.generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved.


The Fifth International Conference on Machine Learning

AI Magazine

Over the last eight years, four workshops on machine learning have been held. Participation in these workshops was by invitation only. In response to the rapid growth in the number of researchers active in machine learning, it was decided that the fifth meeting should be a conference with open attendance and full review for presented papers. Thus, the first open conference on machine learning took place 12 to 14 June 1988 at The University of Michigan at Ann Arbor.


What size net gives valid generalization?

Classics

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size. We show that if m O(W/ log N/) random examples can be loaded on a feedforward network of linear threshold functions with N nodes and W weights, so that at least a fraction 1 /2 of the examples are correctly classified, then one has confidence approaching certainty that the network will correctly classify a fraction 1 of future test examples drawn from the same distribution. Conversely, for fully-connected feedforward nets with one hidden layer, any learning algorithm using fewer than Ω(W/) random training examples will, for some distributions of examples consistent with an appropriate weight choice, fail at least some fixed fraction of the time to find a weight choice that will correctly classify more than a 1 fraction of the future test examples.



Learnability and the Vapnik-Chervonenkis dimension

Classics

In this paper we extend Valiant's model to learning concepts defined by regions in Euclidean n-dimensional space E", n 2 1. The general techniques we develop lead to new results in Boolean domains as well. Our methods are based on the pioneering work of Vapnik and Chervonenkis [6 I-631 on the distribution-free convergence of empirical probability estimates and its application to the theory of pattern recognition. These methods provide a unified treatment of some of Valiant's results, and extend previous results of Pearl [50, 5 I] and Devroye and Wagner ([ 151, see also [ 141) along with our results from [lo]. In learning a class C of concepts (e.g., subsets of E") from examples, a single target concept is selected from C and we are given a finite sequence of points in E", each labeled " 1" if it is in the target concept (a positive example) and "0" if it is not (a negative example). This set is called a sample of the target concept. A learning function for C is a function that, given a large enough randoml:y drawn sample of any target concept in C, returns a region in E" (a hypothesis) that is with high probability a good approximation to the target concept. More precisely: (1) We let P be a fixed probability distribution on E" and assume that examples are created by drawing points independently at random according to P. (2) The error of a hypothesis is taken to be the probability that it disagrees with the target concept on a randomly drawn example, that is, the error is just the probability (according to P) of the symmetric difference between the hypothesis and the target concept.