Country
A Study of Parallel Perturbative Gradient Descent
Motivated by difficulties in analog VLSI implementation of back-propagation [Rumelhart et al., 1986] and related algorithms that calculate gradients based on detailed knowledge of the neural network model, there were several similar recent papersproposing to use a parallel [Alspector et al., 1993, Cauwenberghs, 1993, Kirk et al., 1993] or a semi-parallel [Flower and Jabri, 1993] perturbative technique which has the property that it measures (with the physical neural network) rather than calculates the gradient. This technique is closely related to methods of stochastic approximation[Kushner and Clark, 1978] which have been investigated recently by workers in fields other than neural networks.
An Analog Neural Network Inspired by Fractal Block Coding
Pineda, Fernando J., Andreou, Andreas G.
We consider the problem of decoding block coded data, using a physical dynamical system. We sketch out a decompression algorithm for fractal block codes and then show how to implement a recurrent neural network using physically simple but highly-nonlinear, analog circuit models of neurons and synapses. The nonlinear system has many fixed points, but we have at our disposal a procedure to choose the parameters in such a way that only one solution, the desired solution, is stable. As a partial proof of the concept, we present experimental data from a small system a 16-neuron analog CMOS chip fabricated in a 2m analog p-well process. This chip operates in the subthreshold regime and, for each choice of parameters, converges to a unique stable state. Each state exhibits a qualitatively fractal shape.
An Input Output HMM Architecture
Bengio, Yoshua, Frasconi, Paolo
We introduce a recurrent architecture having a modular structure and we formulate a training procedure based on the EM algorithm. The resulting model has similarities to hidden Markov models, but supports recurrent networks processing style and allows to exploit the supervised learning paradigm while using maximum likelihood estimation. 1 INTRODUCTION Learning problems involving sequentially structured data cannot be effectively dealt with static models such as feedforward networks. Recurrent networks allow to model complex dynamical systems and can store and retrieve contextual information in a flexible way. Up until the present time, research efforts of supervised learning for recurrent networks have almost exclusively focused on error minimization by gradient descent methods. Although effective for learning short term memories, practical difficulties have been reported in training recurrent neural networks to perform tasks in which the temporal contingencies present in the input/output sequences span long intervals (Bengio et al., 1994; Mozer, 1992).
Stochastic Dynamics of Three-State Neural Networks
We present here an analysis of the stochastic neurodynamics of a neural network composed of three-state neurons described by a master equation. An outer-product representation of the master equationis employed. In this representation, an extension of the analysis from two to three-state neurons is easily performed. We apply this formalism with approximation schemes to a simple three-statenetwork and compare the results with Monte Carlo simulations.
On-line Learning of Dichotomies
Barkai, N., Seung, H. S., Sompolinsky, H.
The performance of online algorithms for learning dichotomies is studied. In online learning, thenumber of examples P is equivalent to the learning time, since each example is presented only once. The learning curve, or generalization error as a function of P, depends on the schedule at which the learning rate is lowered. For a target that is a perceptron rule, the learning curve of the perceptron algorithm can decrease as fast as p-1,if the schedule is optimized. If the target is not realizable by a perceptron, the perceptron algorithm does not generally converge to the solution with lowest generalization error.
Capacity and Information Efficiency of a Brain-like Associative Net
Graham, Bruce, Willshaw, David
Bruce Graham and David Willshaw Centre for Cognitive Science, University of Edinburgh 2 Buccleuch Place, Edinburgh, EH8 9LW, UK Email: bruce@cns.ed.ac.uk&david@cns.ed.ac.uk Abstract We have determined the capacity and information efficiency of an associative net configured in a brain-like way with partial connectivity andnoisy input cues. Recall theory was used to calculate the capacity when pattern recall is achieved using a winners-takeall strategy.Transforming the dendritic sum according to input activity and unit usage can greatly increase the capacity of the associative net under these conditions. This corresponds to the level of connectivity commonly seen in the brain and invites speculation that the brain is connected in the most information efficient way. 1 INTRODUCTION Standard network associative memories become more plausible as models of associative memoryin the brain if they incorporate (1) partial connectivity, (2) sparse activity and (3) recall from noisy cues. In this paper we consider the capacity of a binary associative net (Willshaw, Buneman, & Longuet-Higgins, 1969; Willshaw, 1971; Buckingham, 1991) containing these features. While the associative net is a very simple model of associative memory, its behaviour as a storage device is not trivial and yet it is tractable to theoretical analysis.
A Charge-Based CMOS Parallel Analog Vector Quantizer
Cauwenberghs, Gert, Pedroni, Volnei
We present an analog VLSI chip for parallel analog vector quantization. TheMOSIS 2.0 J..Lm double-poly CMOS Tiny chip contains an array of 16 x 16 charge-based distance estimation cells, implementing a mean absolute difference (MAD) metric operating on a 16-input analog vector field and 16 analog template vectors.
Predicting the Risk of Complications in Coronary Artery Bypass Operations using Neural Networks
Lippmann, Richard P., Kukolich, Linda, Shahian, David
MLP networks provided slightly better risk prediction than conventional logistic regression when used to predict the risk of death, stroke, and renal failure on 1257 patients who underwent coronaryartery bypass operations. Bootstrap sampling was required to compare approaches and regularization provided by early stopping was an important component of improved performance. A simplified approach to generating confidence intervals for MLP risk predictions using an auxiliary "confidence MLP" was also developed. The confidence MLP is trained to reproduce the confidence bounds that were generated during training by 50 MLP networks trained using bootstrap samples. Current research is validating these results usinglarger data sets, exploring approaches to detect outlier patients who are so different fromany training patient that accurate risk prediction is suspect, developing approaches toexplaining which input features are important for an individual patient, and determining why MLP networks provide improved performance.