Country
Dynamics of On-Line Gradient Descent Learning for Multilayer Neural Networks
We consider the problem of online gradient descent learning for general two-layer neural networks. An analytic solution is presented and used to investigate the role of the learning rate in controlling the evolution and convergence of the learning process. Two-layer networks with an arbitrary number of hidden units have been shown to be universal approximators [1] for such N-to-one dimensional maps. We investigate the emergence of generalization ability in an online learning scenario [2], in which the couplings are modified after the presentation of each example so as to minimize the corresponding error. The resulting changes in {J} are described as a dynamical evolution; the number of examples plays the role of time.
Optimization Principles for the Neural Code
Recent experiments show that the neural codes at work in a wide range of creatures share some common features. At first sight, these observations seem unrelated. However, we show that these features arise naturally in a linear filtered threshold crossing (LFTC) model when we set the threshold to maximize the transmitted information. This maximization process requires neural adaptation to not only the DC signal level, as in conventional light and dark adaptation, but also to the statistical structure of the signal and noise distributions. We also present a new approach for calculating the mutual information between a neuron's output spike train and any aspect of its input signal which does not require reconstruction of the input signal. This formulation is valid provided the correlations in the spike train are small, and we provide a procedure for checking this assumption. This paper is based on joint work (DeWeese [1], 1995). Preliminary results from the LFTC model appeared in a previous proceedings (DeWeese [2], 1995), and the conclusions we reached at that time have been reaffirmed by further analysis of the model.
Modern Analytic Techniques to Solve the Dynamics of Recurrent Neural Networks
Coolen, A.C.C., Laughton, S. N., Sherrington, D.
We describe the use of modern analytical techniques in solving the dynamics of symmetric and nonsymmetric recurrent neural networks near saturation. These explicitly take into account the correlations between the post-synaptic potentials, and thereby allow for a reliable prediction of transients. 1 INTRODUCTION Recurrent neural networks have been rather popular in the physics community, because they lend themselves so naturally to analysis with tools from equilibrium statistical mechanics. This was the main theme of physicists between, say, 1985 and 1990. Less familiar to the neural network community is a subsequent wave of theoretical physical studies, dealing with the dynamics of symmetric and nonsymmetric recurrent networks. The strategy here is to try to describe the processes at a reduced level of an appropriate small set of dynamic macroscopic observables.
On Neural Networks with Minimal Weights
Bohossian, Vasken, Bruck, Jehoshua
Linear threshold elements are the basic building blocks of artificial neural networks. A linear threshold element computes a function that is a sign of a weighted sum of the input variables. The weights are arbitrary integers; actually, they can be very big integers-exponential in the number of the input variables. However, in practice, it is difficult to implement big weights. In the present literature a distinction is made between the two extreme cases: linear threshold functions with polynomial-size weights as opposed to those with exponential-size weights.
Neural Networks with Quadratic VC Dimension
Koiran, Pascal, Sontag, Eduardo D.
A set of labeled training samples is provided, and a network must be obtained which is then expected to correctly classify previously unseen inputs. In this context, a central problem is to estimate the amount of training data needed to guarantee satisfactory learning performance. To study this question, it is necessary to first formalize the notion of learning from examples. One such formalization is based on the paradigm of probably approximately correct (PAC) learning, due to Valiant (1984). In this framework, one starts by fitting some function /, chosen from a predetermined class F, to the given training data. The class F is often called the "hypothesis class", and for purposes of this discussion it will be assumed that the functions in F take binary values {O, I} and are defined on a common domain X.