Europe
Handwritten Digit Recognition with a Back-Propagation Network
LeCun, Yann, Boser, Bernhard E., Denker, John S., Henderson, Donnie, Howard, R. E., Hubbard, Wayne E., Jackel, Lawrence D.
We present an application of back-propagation networks to handwritten digitrecognition. Minimal preprocessing of the data was required, but architecture of the network was highly constrained and specifically designed for the task. The input of the network consists of normalized images of isolated digits. The method has 1 % error rate and about a 9% reject rate on zipcode digits provided by the U.S. Postal Service. 1 INTRODUCTION The main point of this paper is to show that large back-propagation (BP) networks canbe applied to real image-recognition problems without a large, complex preprocessing stage requiring detailed engineering. Unlike most previous work on the subject (Denker et al., 1989), the learning network is directly fed with images, rather than feature vectors, thus demonstrating the ability of BP networks to deal with large amounts of low level information. Previous work performed on simple digit images (Le Cun, 1989) showed that the architecture of the network strongly influences the network's generalization ability. Good generalization can only be obtained by designing a network architecture that contains a certain amount of a priori knowledge about the problem. The basic design principleis to minimize the number of free parameters that must be determined by the learning algorithm, without overly reducing the computational power of the network.
Associative Memory in a Simple Model of Oscillating Cortex
A generic model of oscillating cortex, which assumes "minimal" coupling justified by known anatomy, is shown to function as an associative memory,using previously developed theory. The network has explicit excitatory neurons with local inhibitory interneuron feedback that forms a set of nonlinear oscillators coupled only by long range excitatofy connections. Using a local Hebb-like learning rule for primary and higher order synapses at the ends of the long range connections, the system learns to store the kinds of oscillation amplitudepatterns observed in olfactory and visual cortex. This rule is derived from a more general "projection algorithm" for recurrent analog networks, that analytically guarantees content addressable memory storage of continuous periodic sequences - capacity: N/2 Fourier components for an N node network - no "spurious" attractors. 1 Introduction This is a sketch of recent results stemming from work which is discussed completely in [1, 2, 3]. Patterns of 40 to 80 hz oscillation have been observed in the large scale activity of olfactory cortex [4] and visual neocortex [5], and shown to predict the olfactory and visual pattern recognition responses of a trained animal.
Second International Workshop on User Modeling
The Second International Workshop on User Modeling was held March 30- April 1, 1990 in Honolulu, Hawaii. The general chairperson was Dr. Wolfgang Wahlster of the University of Saarbrucken; the program and local arrangements chairperson was Dr. David Chin of the University of Hawaii at Manoa. The workshop was sponsored by AAAI and the University of Hawaii, with AAAI providing eight travel stipends for students.
Pulse-Firing Neural Chips for Hundreds of Neurons
Brownlow, Michael, Tarassenko, Lionel, Murray, Alan F., Hamilton, Alister, Han, Il Song, Reekie, H. Martin
U niv. of Edinburgh ABSTRACT We announce new CMOS synapse circuits using only three and four MOSFETsisynapse. Neural states are asynchronous pulse streams, upon which arithmetic is performed directly. Chips implementing over 100 fully programmable synapses are described and projections to networks of hundreds of neurons are made. 1 OVERVIEW OF PULSE FIRING NEURAL VLSI The inspiration for the use of pulse firing in silicon neural networks is clearly the electrical/chemical pulse mechanism in "real" biological neurons. Neurons fire voltage pulses of a frequency determined by their level of activity but of a constant magnitude (usually 5 Volts) [Murray,1989a]. As indicated in Figure 1, synapses perform arithmetic directly on these asynchronous pulses, to increment or decrement the receiving neuron's activity.
A Cost Function for Internal Representations
Krogh, Anders, Thorbergsson, C. I., Hertz, John A.
We introduce a cost function for learning in feed-forward neural networks which is an explicit function of the internal representation inaddition to the weights. The learning problem can then be formulated as two simple perceptrons and a search for internal representations. Back-propagation is recovered as a limit. The frequency of successful solutions is better for this algorithm than for back-propagation when weights and hidden units are updated on the same timescale i.e. once every learning step. 1 INTRODUCTION In their review of back-propagation in layered networks, Rumelhart et al. (1986) describe the learning process in terms of finding good "internal representations" of the input patterns on the hidden units. However, the search for these representations isan indirect one, since the variables which are adjusted in its course are the connection weights, not the activations of the hidden units themselves when specific input patterns are fed into the input layer. Rather, the internal representations are represented implicitly in the connection weight values. More recently, Grossman et al. (1988 and 1989)1 suggested a way in which the search for internal representations could be made much more explicit.
Sequential Decision Problems and Neural Networks
Barto, A. G., Sutton, R. S., Watkins, C. J. C. H.
Decision making tasks that involve delayed consequences are very common yet difficult to address with supervised learning methods. If there is an accurate model of the underlying dynamical system, then these tasks can be formulated as sequential decision problems and solved by Dynamic Programming. This paper discusses reinforcement learningin terms of the sequential decision framework and shows how a learning algorithm similar to the one implemented by the Adaptive Critic Element used in the pole-balancer of Barto, Sutton, and Anderson (1983), and further developed by Sutton (1984), fits into this framework. Adaptive neural networks can play significant roles as modules for approximating the functions required for solving sequential decision problems.
Optimal Brain Damage
LeCun, Yann, Denker, John S., Solla, Sara A.
We have used information-theoretic ideas to derive a class of practical andnearly optimal schemes for adapting the size of a neural network. By removing unimportant weights from a network, several improvementscan be expected: better generalization, fewer training examples required, and improved speed of learning and/or classification. The basic idea is to use second-derivative information tomake a tradeoff between network complexity and training set error. Experiments confirm the usefulness of the methods on a real-world application. 1 INTRODUCTION Most successful applications of neural network learning to real-world problems have been achieved using highly structured networks of rather large size [for example (Waibel, 1989; Le Cun et al., 1990a)]. As applications become more complex, the networks will presumably become even larger and more structured.