Goto

Collaborating Authors

 Country


Generalized Belief Propagation

Neural Information Processing Systems

For general networks with loops, the situation is much less clear. On the one hand, a number of researchers have empirically demonstrated good performance for BP algorithms applied to networks with loops. One dramatic case is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to BP on a loopy network [2, 6]. For some problems in computer vision involving networks with loops, BP has also shown to be accurate and to converge very quickly [2, 1, 7]. On the other hand, for other networks with loops, BP may give poor results or fail to converge [7]. For a general graph, little has been understood about what approximation BP represents, and how it might be improved. This paper's goal is to provide that understanding and introduce a set of new algorithms resulting from that understanding. We show that BP is the first in a progression of local message-passing algorithms, each giving equivalent results to a corresponding approximation from statistical physics known as the "Kikuchi" approximation to the Gibbs free energy. These algorithms have the attractive property of being user-adjustable: by paying some additional computational cost, one can obtain considerable improvement in the accuracy of one's approximation, and can sometimes obtain a convergent message-passing algorithm when ordinary BP does not converge.


Overfitting in Neural Nets: Backpropagation, Conjugate Gradient, and Early Stopping

Neural Information Processing Systems

The conventional wisdom is that backprop nets with excess hidden units generalize poorly. We show that nets with excess capacity generalize well when trained with backprop and early stopping. Experiments suggest two reasons for this: 1) Overfitting can vary significantly in different regions of the model. Excess capacity allows better fit to regions of high non-linearity, and backprop often avoids overfitting the regions of low non-linearity.


On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

Neural Information Processing Systems

Our al exploits the special structure of the sum of squared error measure in Equation (1); hence, the other objective functions are outside the scope of this paper. The gradient vector and Hessian matrix are given by g g(9) JT rand H H(9) JT J S, where J is the m x n Jacobian matrix of r, and S denotes the matrix of second-derivative terms. If S is simply omitted based on the "small residual" assumption, then the Hessian matrix reduces to the Gauss-Newton model Hessian: i.e., JT J. Furthermore, a family of quasi-Newton methods can be applied to approximate term S alone, leading to the augmented Gauss-Newton model Hessian (see, for example, Mizutani [2] and references therein).


Ensemble Learning and Linear Response Theory for ICA

Neural Information Processing Systems

We propose a general Bayesian framework for performing independent (leA) which relies on ensemble learning and linearcomponent analysis response theory known from statistical physics. We apply it to both discrete and continuous sources. For the continuous source the underdetermined (overcomplete) case is studied. The naive mean-field approach fails in this case whereas linear response theory-which gives an improved estimate of covariances-is very efficient. The examples given are for sources without temporal correlations. However, this derivation can easily to treat temporal correlations. Finally, the frameworkbe extended of generating new leA algorithms without needingoffers a simple way to define the prior distribution of the sources explicitly.


Spike-Timing-Dependent Learning for Oscillatory Networks

Neural Information Processing Systems

The model structure is an abstrac- tion of the hippocampus or the olfactory cortex. We propose a simple generalized Hebbian rule, using temporal-activity-dependent LTP and LTD, to encode both magnitudes and phases of oscillatory patterns into the synapses in the network. After learning, the model responds resonantly to inputs which have been learned (or, for networks which operate essentially linearly, to linear combinations of learned inputs), but negligibly to other input patterns. Encoding both amplitude and phase enhances computational capacity, for which the price is having to learn both the excitatory-to-excitatory and the excitatory-to-inhibitory connections. Our model puts contraints on the form of the learning kernal A(r) that should be experimenally observed, e.g., for small oscillation frequencies, it requires that the overall LTP dominates the overall LTD, but this requirement should be modified if the stored oscillations are of high frequencies.



Support Vector Novelty Detection Applied to Jet Engine Vibration Spectra

Neural Information Processing Systems

A system has been developed to extract diagnostic information from jet engine carcass vibration data. Support Vector Machines applied to novelty detection provide a measure of how unusual the shape of a vibration signature is, by learning a representation of normality. We describe a novel method for Support Vector Machines of including information from a second class for novelty detection and give results from the application to Jet Engine vibration analysis.



Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

Neural Information Processing Systems

Based on a statistical mechanics approach, we develop a method for approximately computing average case learning curves for Gaussian process regression models. The approximation works well in the large sample size limit and for arbitrary dimensionality of the input space. We explain how the approximation can be systematically improved and argue that similar techniques can be applied to general likelihood models. 1 Introduction Gaussian process (GP) models have gained considerable interest in the Neural Computation Community (see e.g.[I, 2, 3, 4]) in recent years. Being nonparametric models by construction their theoretical understanding seems to be less well developed compared to simpler parametric models like neural networks. We are especially interested in developing theoretical approaches which will at least give good approximations to generalization errors when the number of training data is sufficiently large. In this paper we present a step in this direction which is based on a statistical mechanics approach.


Place Cells and Spatial Navigation Based on 2D Visual Feature Extraction, Path Integration, and Reinforcement Learning

Neural Information Processing Systems

Visual input, provided by a video camera on a miniature robot, is preprocessed by a set of Gabor filters on 31 nodes of a log-polar retinotopic graph. Unsupervised Hebbian learning is employed to incrementally build a population of localized overlapping place fields. Place cells serve as basis functions for reinforcement learning. Experimental results for goal-oriented navigation of a mobile robot are presented.