Goto

Collaborating Authors

 Inductive Learning




The Recurrent Cascade-Correlation Architecture

Neural Information Processing Systems

Recurrent Cascade-Correlation CRCC) is a recurrent version of the Cascade Correlation learning architecture of Fah I man and Lebiere [Fahlman, 1990]. RCC can learn from examples to map a sequence of inputs into a desired sequence of outputs. New hidden units with recurrent connections are added to the network as needed during training. In effect, the network builds up a finite-state machine tailored specifically for the current problem. RCC retains the advantages of Cascade-Correlation: fast learning, good generalization, automatic construction of a near-minimal multi-layered network, and incremental training. Initially the network contains only inputs, output units, and the connections between them.


VLSI Implementation of TInMANN

Neural Information Processing Systems

A massively parallel, all-digital, stochastic architecture - TlnMAN N - is described which performs competitive and Kohonen types of learning. A VLSI design is shown for a TlnMANN neuron which fits within a small, inexpensive MOSIS TinyChip frame, yet which can be used to build larger networks of several hundred neurons. The neuron operates at a speed of 15 MHz which allows the network to process 290,000 training examples per second. Use of level sensitive scan logic provides the chip with 100% fault coverage, permitting very reliable neural systems to be built.




The Recurrent Cascade-Correlation Architecture

Neural Information Processing Systems

Scott E. Fahlman School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213 Abstract Recurrent Cascade-Correlation CRCC) is a recurrent version of the Cascade Correlation learning architecture of FahIman and Lebiere [Fahlman, 1990]. RCC can learn from examples to map a sequence of inputs into a desired sequence of outputs. New hidden units with recurrent connections are added to the network as needed during training. In effect, the network builds up a finite-state machine tailored specifically for the current problem. RCC retains the advantages of Cascade-Correlation: fast learning, good generalization, automatic construction of a near-minimal multi-layered network, and incremental training. Initially the network contains only inputs, output units, and the connections between them. This single layer of connections is trained (using the Quickprop algorithm [Fahlman, 1988]) to minimize the error.


Training a 3-Node Neural Network is NP-Complete

Neural Information Processing Systems

We consider a 2-layer, 3-node, n-input neural network whose nodes compute linear threshold functions of their inputs. We show that it is NPcomplete to decide whether there exist weights and thresholds for the three nodes of this network so that it will produce output consistent with a given set of training examples. We extend the result to other simple networks. This result suggests that those looking for perfect training algorithms cannot escape inherent computational difficulties just by considering only simple or very regular networks. It also suggests the importance, given a training problem, of finding an appropriate network and input encoding for that problem. It is left as an open problem to extend our result to nodes with nonlinear functions such as sigmoids.


Scaling and Generalization in Neural Networks: A Case Study

Neural Information Processing Systems

The issues of scaling and generalization have emerged as key issues in current studies of supervised learning from examples in neural networks. Questions such as how many training patterns and training cycles are needed for a problem of a given size and difficulty, how to represent the inllUh and how to choose useful training exemplars, are of considerable theoretical and practical importance. Several intuitive rules of thumb have been obtained from empirical studies, but as yet there are few rigorous results. In this paper we summarize a study Qf generalization in the simplest possible case-perceptron networks learning linearly separable functions. The task chosen was the majority function (i.e. return a 1 if a majority of the input units are on), a predicate with a number of useful properties. We find that many aspects of.generalization in multilayer networks learning large, difficult tasks are reproduced in this simple domain, in which concrete numerical results and even some analytic understanding can be achieved.


What Size Net Gives Valid Generalization?

Neural Information Processing Systems

We address the question of when a network can be expected to generalize from m random training examples chosen from some arbitrary probability distribution, assuming that future test examples are drawn from the same distribution. Among our results are the following bounds on appropriate sample vs. network size.