Goto

Collaborating Authors

 Asia


A Variational Principle for Model-based Morphing

Neural Information Processing Systems

Given a multidimensional data set and a model of its density, we consider how to define the optimal interpolation between two points. This is done by assigning a cost to each path through space, based on two competing goals-one to interpolate through regions of high density, the other to minimize arc length. From this path functional, we derive the Euler-Lagrange equations for extremal motionj given two points, the desired interpolation is found by solving a boundary value problem. We show that this interpolation can be done efficiently, in high dimensions, for Gaussian, Dirichlet, and mixture models. 1 Introduction The problem of nonlinear interpolation arises frequently in image, speech, and signal processing. Consider the following two examples: (i) given two profiles of the same face, connect them by a smooth animation of intermediate poses[l]j (ii) given a telephone signal masked by intermittent noise, fill in the missing speech.


Hebb Learning of Features based on their Information Content

Neural Information Processing Systems

This paper investigates the stationary points of a Hebb learning rule with a sigmoid nonlinearity in it. We show mathematically that when the input has a low information content, as measured by the input's variance, this learning rule suppresses learning, that is, forces the weight vector to converge to the zero vector. When the information content exceeds a certain value, the rule will automatically begin to learn a feature in the input. Our analysis suggests that under certain conditions it is the first principal component that is learned. The weight vector length remains bounded, provided the variance of the input is finite. Simulations confirm the theoretical results derived.


Statistical Mechanics of the Mixture of Experts

Neural Information Processing Systems

The mixture of experts [1, 2] is a well known example which implements the philosophy of divide-and-conquer elegantly. Whereas this model are gaining more popularity in various applications, there have been little efforts to evaluate generalization capability of these modular approaches theoretically. Here we present the first analytic study of generalization in the mixture of experts from the statistical 184 K. Kang and 1. Oh physics perspective. Use of statistical mechanics formulation have been focused on the study of feedforward neural network architectures close to the multilayer perceptron[5, 6], together with the VC theory[8]. We expect that the statistical mechanics approach can also be effectively used to evaluate more advanced architectures including mixture models.


Dynamics of Training

Neural Information Processing Systems

A new method to calculate the full training process of a neural network is introduced. No sophisticated methods like the replica trick are used. The results are directly related to the actual number of training steps. Some results are presented here, like the maximal learning rate, an exact description of early stopping, and the necessary number of training steps. Further problems can be addressed with this approach.


Neural Learning in Structured Parameter Spaces - Natural Riemannian Gradient

Neural Information Processing Systems

The parameter space of neural networks has a Riemannian metric structure. The natural Riemannian gradient should be used instead of the conventional gradient, since the former denotes the true steepest descent direction of a loss function in the Riemannian space. The behavior of the stochastic gradient learning algorithm is much more effective if the natural gradient is used. The present paper studies the information-geometrical structure of perceptrons and other networks, and prove that the online learning method based on the natural gradient is asymptotically as efficient as the optimal batch algorithm. Adaptive modification of the learning constant is proposed and analyzed in terms of the Riemannian measure and is shown to be efficient.



Learning Exact Patterns of Quasi-synchronization among Spiking Neurons from Data on Multi-unit Recordings

Neural Information Processing Systems

This paper develops arguments for a family of temporal log-linear models to represent spatiotemporal correlations among the spiking events in a group of neurons. The models can represent not just pairwise correlations but also correlations of higher order. Methods are discussed for inferring the existence or absence of correlations and estimating their strength. A frequentist and a Bayesian approach to correlation detection are compared.


A Neural Model of Visual Contour Integration

Neural Information Processing Systems

Sometimes local features group into regions, as in texture segmentation; at other times they group into contours which may represent object boundaries. Although much is known about the processing steps that extract local features such as oriented input edges, it is still unclear how local features are grouped into global ones more meaningful for objects.


Why did TD-Gammon Work?

Neural Information Processing Systems

Although TD-Gammon is one of the major successes in machine learning, it has not led to similar impressive breakthroughs in temporal difference learning for other applications or even other games. We were able to replicate some of the success of TD-Gammon, developing a competitive evaluation function on a 4000 parameter feed-forward neural network, without using back-propagation, reinforcement or temporal difference learning methods. Instead we apply simple hill-climbing in a relative fitness environment. These results and further analysis suggest that the surprising success of Tesauro's program had more to do with the co-evolutionary structure of the learning task and the dynamics of the backgammon game itself. 1 INTRODUCTION It took great chutzpah for Gerald Tesauro to start wasting computer cycles on temporal difference learning in the game of Backgammon (Tesauro, 1992). After all, the dream of computers mastering a domain by self-play or "introspection" had been around since the early days of AI, forming part of Samuel's checker player (Samuel, 1959) and used in Donald Michie's MENACE tictac-toe learner (Michie, 1961).


Contour Organisation with the EM Algorithm

Neural Information Processing Systems

This paper describes how the early visual process of contour organisation canbe realised using the EM algorithm. The underlying computational representation is based on fine spline coverings. According toour EM approach the adjustment of spline parameters draws on an iterative weighted least-squares fitting process. The expectation step of our EM procedure computes the likelihood of the data using a mixture model defined over the set of spline coverings. Thesesplines are limited in their spatial extent using Gaussian windowing functions.