Goto

Collaborating Authors

 Europe


Correlation Functions in a Large Stochastic Neural Network

Neural Information Processing Systems

In many cases the crosscorrelations between the activities of cortical neurons are approximately symmetric about zero time delay. These have been taken as an indication of the presence of "functional connectivity" between the correlated neurons (Fetz, Toyama and Smith 1991, Abeles 1991). However, a quantitative comparison between the observed cross-correlations and those expected to exist between neurons that are part of a large assembly of interacting population has been lacking. Most of the theoretical studies of recurrent neural network models consider only time averaged firing rates, which are usually given as solutions of mean-field equations. They do not account for the fluctuations about these averages, the study of which requires going beyond the mean-field approximations. In this work we perform a theoretical study of the fluctuations in the neuronal activities and their correlations, in a large stochastic network of excitatory and inhibitory neurons. Depending on the model parameters, this system can exhibit coherent undamped oscillations. Here we focus on parameter regimes where the system is in a statistically stationary state, which is more appropriate for modeling non oscillatory neuronal activity in cortex. Our results for the magnitudes and the time-dependence of the correlation functions can provide a basis for comparison with physiological data on neuronal correlation functions.


Observability of Neural Network Behavior

Neural Information Processing Systems

We prove that except possibly for small exceptional sets, discretetime analog neural nets are globally observable, i.e. all their corrupted pseudo-orbits on computer simulations actually reflect the true dynamical behavior of the network. Locally finite discrete (boolean) neural networks are observable without exception.


The Statistical Mechanics of k-Satisfaction

Neural Information Processing Systems

The satisfiability of random CNF formulae with precisely k variables per clause ("k-SAT") is a popular testbed for the performance of search algorithms. Formulae have M clauses from N variables, randomly negated, keeping the ratio a M / N fixed.


On the Non-Existence of a Universal Learning Algorithm for Recurrent Neural Networks

Neural Information Processing Systems

We prove that the so called "loading problem" for (recurrent) neural networks is unsolvable. This extends several results which already demonstrated that training and related design problems for neural networks are (at least) NPcomplete. Our result also implies that it is impossible to find or to formulate a universal training algorithm, which for any neural network architecture could determine a correct set of weights. For the simple proof of this, we will just show that the loading problem is equivalent to "Hilbert's tenth problem" which is known to be unsolvable.


Non-Linear Statistical Analysis and Self-Organizing Hebbian Networks

Neural Information Processing Systems

Linear neurons learning under an unsupervised Hebbian rule can learn to perform a linear statistical analysis ofthe input data. This was first shown by Oja (1982), who proposed a learning rule which finds the first principal component of the variance matrix of the input data. Based on this model, Oja (1989), Sanger (1989), and many others have devised numerous neural networks which find many components of this matrix. These networks perform principal component analysis (PCA), a well-known method of statistical analysis.


Discontinuous Generalization in Large Committee Machines

Neural Information Processing Systems

The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights. 1 INTRODUCTION Feedforward neural networks are widely used as nonlinear, parametric models for the solution of classification tasks and function approximation. Trained from examples of a given task, they are able to generalize, i.e. to compute the correct output for new, unknown inputs.


Agnostic PAC-Learning of Functions on Analog Neural Nets

Neural Information Processing Systems

Abstract: There exist a number of negative results ([J), [BR), [KV]) about learning on neural nets in Valiant's model [V) for probably approximately correct learning ("PAClearning"). These negative results are based on an asymptotic analysis where one lets the number of nodes in the neural net go to infinit.y. Hence this analysis is less adequate for the investigation of learning on a small fixed neural net.


Optimal Stopping and Effective Machine Complexity in Learning

Neural Information Processing Systems

We study tltt' problem of when to stop If'arning a class of feedforward networks - networks with linear outputs I1PUrOIl and fixed input weights - when they are trained with a gradient descent algorithm on a finite number of examples. Under general regularity conditions, it is shown that there a.re in general three distinct phases in the generalization performance in the learning process, and in particular, the network has hetter gt'neralization pPTformance when learning is stopped at a certain time before til(' global miniIl111lu of the empirical error is reachert. A notion of effective size of a machine is rtefil1e i and used to explain the tradeoff betwf'en the complexity of the marhine and the training error ill the learning process. The study leads nat.urally to a network size selection critt'rion, which turns Ol1t to be a generalization of Akaike's Information Criterioll for the It'arning process. It if; shown that stopping Iparning before tiJt' global minimum of the empirical error has the effect of network size splectioll. 1 INTRODUCTION The primary goal of learning in neural nets is to find a network that gives valid generalization. In achieving this goal, a central issue is the tradeoff between the training error and network complexity. This usually reduces to a problem of network size selection, which has drawn much research effort in recent years. Various principles, theories, and intuitions, including Occam's razor, statistical model selection criteria such as Akaike's Information Criterion (AIC) [11 and many others [5, 1, 10,3,111 all quantitatively support the following PAC prescription: between two machines which have the same empirical error, the machine with smaller VC-dimf'nsion generalizes better. However, it is noted that these methods or criteria do not npcpssarily If'ad to optimal (or llearly optimal) generalization performance.


Supervised Learning with Growing Cell Structures

Neural Information Processing Systems

Feed-forward networks of localized (e.g., Gaussian) units are an interesting alternative to the more frequently used networks of global (e.g., sigmoidal) units. It has been shown that with localized units one hidden layer suffices in principle to approximate any continuous function, whereas with sigmoidal units two layers are necessary. In the following we are considering radial basis function networks similar to those proposed by Moody & Darken (1989) or Poggio & Girosi (1990). Such networks consist of one layer L of Gaussian units.


Adaptive knot Placement for Nonparametric Regression

Neural Information Processing Systems

We show how an "Elman" network architecture, constructed from recurrently connected oscillatory associative memory network modules, can employ selective "attentional" control of synchronization to direct the flow of communication and computation within the architecture to solve a grammatical inference problem. Previously we have shown how the discrete time "Elman" network algorithm can be implemented in a network completely described by continuous ordinary differential equations. The time steps (machine cycles) of the system are implemented by rhythmic variation (clocking) of a bifurcation parameter. In this architecture, oscillation amplitude codes the information content or activity of a module (unit), whereas phase and frequency are used to "softwire" the network. Only synchronized modules communicate by exchanging amplitude information; the activity of non-resonating modules contributes incoherent crosstalk noise. Attentional control is modeled as a special subset of the hidden modules with ouputs which affect the resonant frequencies of other hidden modules. They control synchrony among the other modules and direct the flow of computation (attention) to effect transitions between two subgraphs of a thirteen state automaton which the system emulates to generate a Reber grammar. The internal crosstalk noise is used to drive the required random transitions of the automaton.