Goto

Collaborating Authors

 Country


History-Dependent Attractor Neural Networks

Neural Information Processing Systems

We present a methodological framework enabling a detailed description of the performance of Hopfield-like attractor neural networks (ANN) in the first two iterations. Using the Bayesian approach, we find that performance is improved when a history-based term is included in the neuron's dynamics. A further enhancement of the network's performance is achieved by judiciously choosing the censored neurons (those which become active in a given iteration) on the basis of the magnitude of their post-synaptic potentials. The contribution of biologically plausible, censored, historydependent dynamics is especially marked in conditions of low firing activity and sparse connectivity, two important characteristics of the mammalian cortex. In such networks, the performance attained is higher than the performance of two'independent' iterations, which represents an upper bound on the performance of history-independent networks.


Single-Iteration Threshold Hamming Networks

Neural Information Processing Systems

The HN calculates the Hamming distance between the input pattern and each memory pattern, and selects the memory with the smallest distance. It is composed of two subnets: The similarity subnet, consisting of an n-neuron input layer connected with an m-neuron memory layer, calculates the number of equal bits between the input and each memory pattern. The winner-take-all (WTA) subnet, consisting of a fully connected m-neuron topology, selects the memory neuron that best matches the input pattern.


Predicting Complex Behavior in Sparse Asymmetric Networks

Neural Information Processing Systems

Recurrent networks of threshold elements have been studied intensively as associative memories and pattern-recognition devices. While most research has concentrated on fully-connected symmetric networks.


Destabilization and Route to Chaos in Neural Networks with Random Connectivity

Neural Information Processing Systems

The occurence of chaos in recurrent neural networks is supposed to depend on the architecture and on the synaptic coupling strength. It is studied here for a randomly diluted architecture. By normalizing the variance of synaptic weights, we produce a bifurcation parameter, dependent on this variance and on the slope of the transfer function but independent of the connectivity, that allows a sustained activity and the occurence of chaos when reaching a critical value. Even for weak connectivity and small size, we find numerical results in accordance with the theoretical ones previously established for fully connected infinite sized networks. Moreover the route towards chaos is numerically checked to be a quasi-periodic one, whatever the type of the first bifurcation is (Hopf bifurcation, pitchfork or flip).


On the Use of Evidence in Neural Networks

Neural Information Processing Systems

The Bayesian "evidence" approximation has recently been employed to determine the noise and weight-penalty terms used in back-propagation. This paper shows that for neural nets it is far easier to use the exact result than it is to use the evidence approximation. Moreover, unlike the evidence approximation, the exact result neither has to be re-calculated for every new data set, nor requires the running of computer code (the exact result is closed form). In addition, it turns out that the evidence procedure's MAP estimate for neural nets is, in toto, approximation error. Another advantage of the exact analysis is that it does not lead one to incorrect intuition, like the claim that using evidence one can "evaluate different priors in light of the data". This paper also discusses sufficiency conditions for the evidence approximation to hold, why it can sometimes give "reasonable" results, etc.


Probability Estimation from a Database Using a Gibbs Energy Model

Neural Information Processing Systems

We present an algorithm for creating a neural network which produces accurate probability estimates as outputs. The network implements a Gibbs probability distribution model of the training database. This model is created by a new transformation relating the joint probabilities of attributes in the database to the weights (Gibbs potentials) of the distributed network model. The theory of this transformation is presented together with experimental results. One advantage of this approach is the network weights are prescribed without iterative gradient descent. Used as a classifier the network tied or outperformed published results on a variety of databases.


Statistical Mechanics of Learning in a Large Committee Machine

Neural Information Processing Systems

We use statistical mechanics to study generalization in large committee machines. For an architecture with nonoverlapping receptive fields a replica calculation yields the generalization error in the limit of a large number of hidden units.


Weight Space Probability Densities in Stochastic Learning: II. Transients and Basin Hopping Times

Neural Information Processing Systems

In stochastic learning, weights are random variables whose time evolution is governed by a Markov process. We summarize the theory of the time evolution of P, and give graphical examples of the time evolution that contrast the behavior of stochastic learning with true gradient descent (batch learning). Finally, we use the formalism to obtain predictions of the time required for noise-induced hopping between basins of different optima. We compare the theoretical predictions with simulations of large ensembles of networks for simple problems in supervised and unsupervised learning. Despite the recent application of convergence theorems from stochastic approximation theory to neural network learning (Oja 1982, White 1989) there remain outstanding questions about the search dynamics in stochastic learning.


Unsupervised Discrimination of Clustered Data via Optimization of Binary Information Gain

Neural Information Processing Systems

We present the information-theoretic derivation of a learning algorithm that clusters unlabelled data with linear discriminants. In contrast to methods that try to preserve information about the input patterns, we maximize the information gained from observing the output of robust binary discriminators implemented with sigmoid nodes. We deri ve a local weight adaptation rule via gradient ascent in this objective, demonstrate its dynamics on some simple data sets, relate our approach to previous work and suggest directions in which it may be extended.


Synaptic Weight Noise During MLP Learning Enhances Fault-Tolerance, Generalization and Learning Trajectory

Neural Information Processing Systems

We analyse the effects of analog noise on the synaptic arithmetic during MultiLayer Perceptron training, by expanding the cost function to include noise-mediated penalty terms. Predictions are made in the light of these calculations which suggest that fault tolerance, generalisation ability and learning trajectory should be improved by such noise-injection. Extensive simulation experiments on two distinct classification problems substantiate the claims. The results appear to be perfectly general for all training schemes where weights are adjusted incrementally, and have wide-ranging implications for all applications, particularly those involving "inaccurate" analog neural VLSI.