Plotting

 Country


Asymptotics of Gradient-based Neural Network Training Algorithms

Neural Information Processing Systems

We study the asymptotic properties of the sequence of iterates of weight-vector estimates obtained by training a multilayer feed forward neural network with a basic gradient-descent method using a fixed learning constant and no batch-processing. In the onedimensional case, an exact analysis establishes the existence of a limiting distribution that is not Gaussian in general. For the general case and small learning constant, a linearization approximation permits the application of results from the theory of random matrices to again establish the existence of a limiting distribution. We study the first few moments of this distribution to compare and contrast the results of our analysis with those of techniques of stochastic approximation. 1 INTRODUCTION The wide applicability of neural networks to problems in pattern classification and signal processing has been due to the development of efficient gradient-descent algorithms for the supervised training of multilayer feedforward neural networks with differentiable node functions. A basic version uses a fixed learning constant and updates all weights after each training input is presented (online mode) rather than after the entire training set has been presented (batch mode). The properties of this algorithm as exhibited by the sequence of iterates are not yet well-understood. There are at present two major approaches.



An Analog Neural Network Inspired by Fractal Block Coding

Neural Information Processing Systems

We consider the problem of decoding block coded data, using a physical dynamical system. We sketch out a decompression algorithm for fractal block codes and then show how to implement a recurrent neural network using physically simple but highly-nonlinear, analog circuit models of neurons and synapses. The nonlinear system has many fixed points, but we have at our disposal a procedure to choose the parameters in such a way that only one solution, the desired solution, is stable. As a partial proof of the concept, we present experimental data from a small system a 16-neuron analog CMOS chip fabricated in a 2m analog p-well process. This chip operates in the subthreshold regime and, for each choice of parameters, converges to a unique stable state. Each state exhibits a qualitatively fractal shape.


A model of the hippocampus combining self-organization and associative memory function

Neural Information Processing Systems

A model of the hippocampus is presented which forms rapid self -organized representations of input arriving via the perforant path, performs recall of previous associations in region CA3, and performs comparison of this recall with afferent input in region CA 1. This comparison drives feedback regulation of cholinergic modulation to set appropriate dynamics for learning of new representations in region CA3 and CA 1. The network responds to novel patterns with increased cholinergic modulation, allowing storage of new self-organized representations, but responds to familiar patterns with a decrease in acetylcholine, allowing recall based on previous representations. This requires selectivity of the cholinergic suppression of synaptic transmission in stratum radiatum of regions CA3 and CAl, which has been demonstrated experimentally. 1 INTRODUCTION A number of models of hippocampal function have been developed (Burgess et aI., 1994; Myers and Gluck, 1994; Touretzky et al., 1994), but remarkably few simulations have addressed hippocampal function within the constraints provided by physiological and anatomical data. Theories of the function of specific subregions of the hippocampal formation often do not address physiological mechanisms for changing dynamics between learning of novel stimuli and recall of familiar stimuli.


A solvable connectionist model of immediate recall of ordered lists

Neural Information Processing Systems

A model of short-term memory for serially ordered lists of verbal stimuli is proposed as an implementation of the'articulatory loop' thought to mediate this type of memory (Baddeley, 1986). The model predicts the presence of a repeatable time-varying'context' signal coding the timing of items' presentation in addition to a store of phonological information and a process of serial rehearsal. Items are associated with context nodes and phonemes by Hebbian connections showing both short and long term plasticity. Items are activated by phonemic input during presentation and reactivated by context and phonemic feedback during output. Serial selection of items occurs via a winner-take-all interaction amongst items, with the winner subsequently receiving decaying inhibition. An approximate analysis of error probabilities due to Gaussian noise during output is presented. The model provides an explanatory account of the probability of error as a function of serial position, list length, word length, phonemic similarity, temporal grouping, item and list familiarity, and is proposed as the starting point for a model of rehearsal and vocabulary acquisition.


Dynamic Modelling of Chaotic Time Series with Neural Networks

Neural Information Processing Systems

In young barn owls raised with optical prisms over their eyes, these auditory maps are shifted to stay in register with the visual map, suggesting that the visual input imposes a frame of reference on the auditory maps. However, the optic tectum, the first site of convergence of visual with auditory information, is not the site of plasticity for the shift of the auditory maps; the plasticity occurs instead in the inferior colliculus, which contains an auditory map and projects into the optic tectum. We explored a model of the owl remapping in which a global reinforcement signal whose delivery is controlled by visual foveation. A hebb learning rule gated by reinforcement learned to appropriately adjust auditory maps. In addition, reinforcement learning preferentially adjusted the weights in the inferior colliculus, as in the owl brain, even though the weights were allowed to change throughout the auditory system. This observation raises the possibility that the site of learning does not have to be genetically specified, but could be determined by how the learning procedure interacts with the network architecture.


Analysis of Unstandardized Contributions in Cross Connected Networks

Neural Information Processing Systems

Understanding knowledge representations in neural nets has been a difficult problem. Principal components analysis (PCA) of contributions (products of sending activations and connection weights) has yielded valuable insights into knowledge representations, but much of this work has focused on the correlation matrix of contributions. The present work shows that analyzing the variance-covariance matrix of contributions yields more valid insights by taking account of weights.


Convergence Properties of the K-Means Algorithms

Neural Information Processing Systems

K-Means is a popular clustering algorithm used in many applications, including the initialization of more computationally expensive algorithms (Gaussian mixtures, Radial Basis Functions, Learning Vector Quantization and some Hidden Markov Models). The practice of this initialization procedure often gives the frustrating feeling that K-Means performs most of the task in a small fraction of the overall time. This motivated us to better understand this convergence speed. A second reason lies in the traditional debate between hard threshold (e.g.



Higher Order Statistical Decorrelation without Information Loss

Neural Information Processing Systems

A neural network learning paradigm based on information theory is proposed as a way to perform in an unsupervised fashion, redundancy reduction among the elements of the output layer without loss of information from the sensory input. The model developed performs nonlinear decorrelation up to higher orders of the cumulant tensors and results in probabilistic ally independent components of the output layer. This means that we don't need to assume Gaussian distribution neither at the input nor at the output. The theory presented is related to the unsupervised-learning theory of Barlow, which proposes redundancy reduction as the goal of cognition. When nonlinear units are used nonlinear principal component analysis is obtained.