Inductive Learning
Sequential Decision Problems and Neural Networks
Decision making tasks that involve delayed consequences are very common yet difficult to address with supervised learning methods. If there is an accurate model of the underlying dynamical system, then these tasks can be formulated as sequential decision problems and solved by Dynamic Programming. This paper discusses rein(cid:173) forcement learning in terms of the sequential decision framework and shows how a learning algorithm similar to the one implemented by the Adaptive Critic Element used in the pole-balancer of Barto, Sutton, and Anderson (1983), and further developed by Sutton (1984), fits into this framework. Adaptive neural networks can play significant roles as modules for approximating the functions required for solving sequential decision problems.
Optimal Brain Damage
We have used information-theoretic ideas to derive a class of prac(cid:173) tical and nearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, sev(cid:173) eral improvements can be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative informa(cid:173) tion to make a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application.
The Recurrent Cascade-Correlation Architecture
Recurrent Cascade-Correlation CRCC) is a recurrent version of the Cascade(cid:173) Correlation learning architecture of Fah I man and Lebiere [Fahlman, 1990]. RCC can learn from examples to map a sequence of inputs into a desired sequence of outputs. New hidden units with recurrent connections are added to the network as needed during training. In effect, the network builds up a finite-state machine tailored specifically for the current problem. RCC retains the advantages of Cascade-Correlation: fast learning, good generalization, automatic construction of a near-minimal multi-layered network, and incremental training.
VLSI Implementation of TInMANN
A massively parallel, all-digital, stochastic architecture - TlnMAN N - is described which performs competitive and Kohonen types of learning. A VLSI design is shown for a TlnMANN neuron which fits within a small, inexpensive MOSIS TinyChip frame, yet which can be used to build larger networks of several hundred neurons. The neuron operates at a speed of 15 MHz which allows the network to process 290,000 training examples per second. Use of level sensitive scan logic provides the chip with 100% fault coverage, permitting very reliable neural systems to be built.
Extensions of a Theory of Networks for Approximation and Learning: Outliers and Negative Examples
Learning an input-output mapping from a set of examples can be regarded as synthesizing an approximation of a multi-dimensional function. From this point of view, this form of learning is closely related to regularization theory, and we have previously shown (Poggio and Girosi, 1990a, 1990b) the equivalence between reglilari at.ioll and a. class of three-layer networks that we call regularization networks.
Induction of Finite-State Automata Using Second-Order Recurrent Networks
Second-order recurrent networks that recognize simple finite state lan(cid:173) guages over {0,1}* are induced from positive and negative examples. Us(cid:173) ing the complete gradient of the recurrent network and sufficient training examples to constrain the definition of the language to be induced, solu(cid:173) tions are obtained that correctly recognize strings of arbitrary length. A method for extracting a finite state automaton corresponding to an opti(cid:173) mized network is demonstrated.
Repeat Until Bored: A Pattern Selection Strategy
An alternative to the typical technique of selecting training examples independently from a fixed distribution is fonnulated and analyzed, in which the current example is presented repeatedly until the error for that item is reduced to some criterion value,; then, another item is ran(cid:173) domly selected. The convergence time can be dramatically increased or decreased by this heuristic, depending on the task, and is very sensitive to the value of .
Statistical Mechanics of Learning in a Large Committee Machine
We use statistical mechanics to study generalization in large com(cid:173) mittee machines. For an architecture with nonoverlapping recep(cid:173) tive fields a replica calculation yields the generalization error in the limit of a large number of hidden units. For continuous weights the generalization error falls off asymptotically inversely proportional to Q, the number of training examples per weight. For binary weights we find a discontinuous transition from poor to perfect generalization followed by a wide region of metastability. Broken replica symmetry is found within this region at low temperatures.
Supervised Learning with Growing Cell Structures
We present a new incremental radial basis function network suit(cid:173) able for classification and regression problems. Center positions are continuously updated through soft competitive learning. The width of the radial basis functions is derived from the distance to topological neighbors. During the training the observed error is accumulated locally and used to determine where to insert the next unit. This leads (in case of classification problems) to the placement of units near class borders rather than near frequency peaks as is done by most existing methods.