Goto

Collaborating Authors

 Directed Networks


Bayesian Backprop in Action: Pruning, Committees, Error Bars and an Application to Spectroscopy

Neural Information Processing Systems

MacKay's Bayesian framework for backpropagation is conceptually appealing as well as practical. It automatically adjusts the weight decay parameters during training, and computes the evidence for each trained network. The evidence is proportional to our belief in the model. In this paper, the framework is extended to pruned nets, leading to an Ockham Factor for "tuning the architecture to the data". A committee of networks, selected by their high evidence, is a natural Bayesian construction.


Bayesian Self-Organization

Neural Information Processing Systems

Smirnakis Lyman Laboratory of Physics Harvard University Cambridge, MA 02138 Lei Xu * Dept. of Computer Science HSH ENG BLDG, Room 1006 The Chinese University of Hong Kong Shatin, NT Hong Kong Abstract Recent work by Becker and Hinton (Becker and Hinton, 1992) shows a promising mechanism, based on maximizing mutual information assumingspatial coherence, by which a system can selforganize itself to learn visual abilities such as binocular stereo. We introduce a more general criterion, based on Bayesian probability theory, and thereby demonstrate a connection to Bayesian theories ofvisual perception and to other organization principles for early vision (Atick and Redlich, 1990). Methods for implementation usingvariants of stochastic learning are described and, for the special case of linear filtering, we derive an analytic expression for the output. 1 Introduction The input intensity patterns received by the human visual system are typically complicated functions of the object surfaces and light sources in the world. It *Lei Xu was a research scholar in the Division of Applied Sciences at Harvard University while this work was performed. Thus the visual system must be able to extract information from the input intensities that is relatively independent of the actual intensity values.


Bayesian Backpropagation Over I-O Functions Rather Than Weights

Neural Information Processing Systems

The conventional Bayesian justification of backprop is that it finds the MAP weight vector. As this paper shows, to find the MAP io function instead one must add a correction tenn to backprop. That tenn biases one towards io functions with small description lengths, and in particular favors (somekinds of) feature-selection, pruning, and weight-sharing.


Bayesian Modeling and Classification of Neural Signals

Neural Information Processing Systems

Signal processing and classification algorithms often have limited applicability resulting from an inaccurate model of the signal's underlying structure.We present here an efficient, Bayesian algorithm for modeling a signal composed of the superposition of brief, Poisson-distributed functions. This methodology is applied to the specific problem of modeling and classifying extracellular neural waveforms which are composed of a superposition of an unknown number of action potentials CAPs). Previous approaches have had limited success due largely to the problems of determining the spike shapes, deciding how many are shapes distinct, and decomposing overlapping APs. A Bayesian solution to each of these problems is obtained by inferring a probabilistic model of the waveform. This approach quantifies the uncertainty of the form and number of the inferred AP shapes and is used to obtain an efficient method for decomposing complex overlaps. This algorithm can extract many times more information than previous methods and facilitates the extracellular investigation of neuronal classes and of interactions within neuronal circuits.


Bayesian Backprop in Action: Pruning, Committees, Error Bars and an Application to Spectroscopy

Neural Information Processing Systems

MacKay's Bayesian framework for backpropagation is conceptually appealing as well as practical. It automatically adjusts the weight decay parameters during training, and computes the evidence for each trained network. The evidence is proportional to our belief in the model. In this paper, the framework is extended to pruned nets, leading to an Ockham Factor for "tuning the architecture to the data". A committee of networks, selected by their high evidence, is a natural Bayesian construction.


Putting It All Together: Methods for Combining Neural Networks

Neural Information Processing Systems

In solving these tasks, one is faced with a large variety of learning algorithms and a vast selection of possible network architectures. After all the training, how does one know which is the best network? This decision is further complicated by the fact that standard techniques can be severely limited by problems such as over-fitting, data sparsity and local optima. The usual solution to these problems is a winner-take-all cross-validatory model selection. However, recent experimental and theoretical work indicates that we can improve performance by considering methods for combining neural networks.


Putting It All Together: Methods for Combining Neural Networks

Neural Information Processing Systems

The past several years have seen a tremendous growth in the complexity of the recognition, estimation and control tasks expected of neural networks. In solving these tasks, one is faced with a large variety of learning algorithms and a vast selection of possible network architectures. After all the training, how does one know which is the best network? This decision is further complicated by the fact that standard techniques can be severely limited by problems such as over-fitting, data sparsity and local optima. The usual solution to these problems is a winner-take-all cross-validatory model selection.


Learning in Compositional Hierarchies: Inducing the Structure of Objects from Data

Neural Information Processing Systems

Model-based object recognition solves the problem of invariant recognition by relying on stored prototypes at unit scale positioned at the origin of an object-centered coordinate system. Elastic matching techniques are used to find a correspondence between features of the stored model and the data and can also compute the parameters of the transformation the observed instance has undergone relative to the stored model.


Bayesian Modeling and Classification of Neural Signals

Neural Information Processing Systems

Signal processing and classification algorithms often have limited applicability resulting from an inaccurate model of the signal's underlying structure. We present here an efficient, Bayesian algorithm for modeling a signal composed of the superposition of brief, Poisson-distributed functions. This methodology is applied to the specific problem of modeling and classifying extracellular neural waveforms which are composed of a superposition of an unknown number of action potentials CAPs). Previous approaches have had limited success due largely to the problems of determining the spike shapes, deciding how many are shapes distinct, and decomposing overlapping APs. A Bayesian solution to each of these problems is obtained by inferring a probabilistic model of the waveform. This approach quantifies the uncertainty of the form and number of the inferred AP shapes and is used to obtain an efficient method for decomposing complex overlaps. This algorithm can extract many times more information than previous methods and facilitates the extracellular investigation of neuronal classes and of interactions within neuronal circuits.


Probabilistic Anomaly Detection in Dynamic Systems

Neural Information Processing Systems

This paper describes probabilistic methods for novelty detection when using pattern recognition methods for fault monitoring of dynamic systems. The problem of novelty detection is particularly acute when prior knowledge and training data only allow one to construct an incomplete classification model. Allowance must be made in model design so that the classifier will be robust to data generated by classes not included in the training phase. For diagnosis applications one practical approach is to construct both an input density model and a discriminative class model. Using Bayes' rule and prior estimates of the relative likelihood of data of known and unknown origin the resulting classification equations are straightforward.