Bayesian Backprop in Action: Pruning, Committees, Error Bars and an Application to Spectroscopy
MacKay's Bayesian framework for backpropagation is conceptually appealing as well as practical. It automatically adjusts the weight decay parameters during training, and computes the evidence for each trained network. The evidence is proportional to our belief in the model. In this paper, the framework is extended to pruned nets, leading to an Ockham Factor for "tuning the architecture to the data". A committee of networks, selected by their high evidence, is a natural Bayesian construction.
Discontinuous Generalization in Large Committee Machines
The problem of learning from examples in multilayer networks is studied within the framework of statistical mechanics. Using the replica formalism we calculate the average generalization error of a fully connected committee machine in the limit of a large number of hidden units. If the number of training examples is proportional to the number of inputs in the network, the generalization error as a function of the training set size approaches a finite value. If the number of training examples is proportional to the number of weights in the network we find first-order phase transitions with a discontinuous drop in the generalization error for both binary and continuous weights. 1 INTRODUCTION Feedforward neural networks are widely used as nonlinear, parametric models for the solution of classification tasks and function approximation. Trained from examples of a given task, they are able to generalize, i.e. to compute the correct output for new, unknown inputs.
Learning Curves: Asymptotic Values and Rate of Convergence
Cortes, Corinna, Jackel, L. D., Solla, Sara A., Vapnik, Vladimir, Denker, John S.
Training classifiers on large databases is computationally demanding. Itis desirable to develop efficient procedures for a reliable prediction of a classifier's suitability for implementing a given task, so that resources can be assigned to the most promising candidates or freed for exploring new classifier candidates. We propose such a practical and principled predictive method. Practical because it avoids the costly procedure of training poor classifiers on the whole training set, and principled because of its theoretical foundation. The effectiveness of the proposed procedure is demonstrated for both single-and multi-layer networks.
Connectionism for Music and Audition
In recent years, NIPS has heard neural networks generate tunes and harmonize chorales. With a large amount of music becoming available in computer readable form, real data can be used to train connectionist models. At the beginning of this workshop, Andreas Weigend focused on architectures to capture structure on multiple time scales.
Observability of Neural Network Behavior
Garzon, Max, Botelho, Fernanda
We prove that except possibly for small exceptional sets, discretetime analogneural nets are globally observable, i.e. all their corrupted pseudo-orbitson computer simulations actually reflect the true dynamical behavior of the network. Locally finite discrete (boolean) neural networks are observable without exception.
Correlation Functions in a Large Stochastic Neural Network
Ginzburg, Iris, Sompolinsky, Haim
In many cases the crosscorrelations betweenthe activities of cortical neurons are approximately symmetric about zero time delay. These have been taken as an indication of the presence of "functional connectivity" between the correlated neurons (Fetz, Toyama and Smith 1991, Abeles 1991). However, a quantitative comparison between the observed cross-correlations and those expected to exist between neurons that are part of a large assembly of interacting population has been lacking. Most of the theoretical studies of recurrent neural network models consider only time averaged firing rates, which are usually given as solutions of mean-field equations. They do not account for the fluctuations about these averages, the study of which requires going beyond the mean-field approximations. In this work we perform a theoretical study of the fluctuations in the neuronal activities and their correlations, in a large stochastic network of excitatory and inhibitory neurons. Depending on the model parameters, this system can exhibit coherent undamped oscillations. Here we focus on parameter regimes where the system is in a statistically stationary state, which is more appropriate for modeling non oscillatory neuronal activity in cortex. Our results for the magnitudes and the time-dependence of the correlation functions can provide a basis for comparison with physiological data on neuronal correlation functions.
Two-Dimensional Object Localization by Coarse-to-Fine Correlation Matching
Lu, Chien-Ping, Mjolsness, Eric
Chien-Ping Lu and Eric Mjolsness Department of Computer Science Yale University New Haven, CT 06520-8285 Abstract We present a Mean Field Theory method for locating twodimensional objectsthat have undergone rigid transformations. The resulting algorithm is a form of coarse-to-fine correlation matching. We first consider problems of matching synthetic point data, and derive a point matching objective function. A tractable line segment matching objective function is derived by considering each line segment as a dense collection of points, and approximating itby a sum of Gaussians. The algorithm is tested on real images from which line segments are extracted and matched. 1 Introduction Assume that an object in a scene can be viewed as an instance of the model placed in space by some spatial transformation, and object recognition is achieved by discovering aninstance of the model in the scene.
Structural and Behavioral Evolution of Recurrent Networks
Saunders, Gregory M., Angeline, Peter J., Pollack, Jordan B.
This paper introduces GNARL, an evolutionary program which induces recurrent neural networks that are structurally unconstrained. In contrast to constructive and destructive algorithms, GNARL employs a population ofnetworks and uses a fitness function's unsupervised feedback to guide search through network space. Annealing is used in generating both gaussian weight changes and structural modifications. Applying GNARL to a complex search and collection task demonstrates that the system is capable of inducing networks with complex internal dynamics.
Unsupervised Learning of Mixtures of Multiple Causes in Binary Data
This paper presents a formulation for unsupervised learning of clusters reflectingmultiple causal structure in binary data. Unlike the standard mixture model, a multiple cause model accounts for observed databy combining assertions from many hidden causes, each of which can pertain to varying degree to any subset of the observable dimensions.A crucial issue is the mixing-function for combining beliefs from different cluster-centers in order to generate data reconstructions whose errors are minimized both during recognition and learning. We demonstrate a weakness inherent to the popular weighted sum followed by sigmoid squashing, and offer an alternative formof the nonlinearity. Results are presented demonstrating the algorithm's ability successfully to discover coherent multiple causal representat.ions of noisy test data and in images of printed characters. 1 Introduction The objective of unsupervised learning is to identify patterns or features reflecting underlying regularities in data. Single-cause techniques, including the k-means algorithm andthe standard mixture-model (Duda and Hart, 1973), represent clusters of data points sharing similar patterns of Is and Os under the assumption that each data point belongs to, or was generated by, one and only one cluster-center; output activity is constrained to sum to 1. In contrast, a multiple-cause model permits more than one cluster-center to become fully active in accounting for an observed data vector. The advantage of a multiple cause model is that a relatively small number 27 28 Saund of hidden variables can be applied combinatorially to generate a large data set.