Goto

Collaborating Authors

 Country


Bayesian Query Construction for Neural Network Models

Neural Information Processing Systems

If data collection is costly, there is much to be gained by actively selecting particularly informative data points in a sequential way. In a Bayesian decision-theoretic framework we develop a query selection criterion which explicitly takes into account the intended use of the model predictions. By Markov Chain Monte Carlo methods the necessary quantities can be approximated to a desired precision. As the number of data points grows, the model complexity is modified by a Bayesian model selection strategy. The properties of two versions of the criterion ate demonstrated in numerical experiments.


A Non-linear Information Maximisation Algorithm that Performs Blind Separation

Neural Information Processing Systems

With the exception of (Becker 1992), there has been little attempt to use non-linearity in networks to achieve something a linear network could not. Nonlinear networks, however, are capable of computing more general statistics than those second-order ones involved in decorrelation, and as a consequence they are capable of dealing with signals (and noises) which have detailed higher-order structure. The success of the'H-J' networks at blind separation (Jutten & Herault 1991) suggests that it should be possible to separate statistically independent components, by using learning rules which make use of moments of all orders. This paper takes a principled approach to this problem, by starting with the question of how to maximise the information passed on in nonlinear feed-forward network. Starting with an analysis of a single unit, the approach is extended to a network mapping N inputs to N outputs. In the process, it will be shown that, under certain fairly weak conditions, the N ---. N network forms a minimally redundant encoding ofthe inputs, and that it therefore performs Independent Component Analysis (ICA). 2 Information maximisation The information that output Y contains about input X is defined as: I(Y, X) H(Y) - H(YIX) (1) where H(Y) is the entropy (information) in the output, while H(YIX) is whatever information the output has which didn't come from the input. In the case that we have no noise (or rather, we don't know what is noise and what is signal in the input), the mapping between X and Y is deterministic and H(YIX) has its lowest possible value of


Learning direction in global motion: two classes of psychophysically-motivated models

Neural Information Processing Systems

Perceptual learning is defined as fast improvement in performance and retention of the learned ability over a period of time. In a set of psychophysical experiments we demonstrated that perceptual learning occurs for the discrimination of direction in stochastic motion stimuli. Here we model this learning using two approaches: a clustering model that learns to accommodate the motion noise, and an averaging model that learns to ignore the noise. Simulations of the models show performance similar to the psychophysical results. 1 Introduction Global motion perception is critical to many visual tasks: to perceive self-motion, to identify objects in motion, to determine the structure of the environment, and to make judgements for safe navigation. In the presence of noise, as in random dot kinematograms, efficient extraction of global motion involves considerable spatial integration. Newsome and Colleagues (1989) showed that neurons in the macaque middle temporal area (MT) are motion direction-selective, and perform global integration of motion in their large receptive fields. Psychophysical studies in humans have characterized the limits of spatial and temporal integration in motion (Watamaniuk et.


Recurrent Networks: Second Order Properties and Pruning

Neural Information Processing Systems

Second order properties of cost functions for recurrent networks are investigated. We analyze a layered fully recurrent architecture, the virtue of this architecture is that it features the conventional feedforward architecture as a special case. A detailed description of recursive computation of the full Hessian of the network cost function is provided. We discuss the possibility of invoking simplifying approximations of the Hessian and show how weight decays iron the cost function and thereby greatly assist training. We present tentative pruning results, using Hassibi et al.'s Optimal Brain Surgeon, demonstrating that recurrent networks can construct an efficient internal memory. 1 LEARNING IN RECURRENT NETWORKS Time series processing is an important application area for neural networks and numerous architectures have been suggested, see e.g. (Weigend and Gershenfeld, 94). The most general structure is a fully recurrent network and it may be adapted using Real Time Recurrent Learning (RTRL) suggested by (Williams and Zipser, 89). By invoking a recurrent network, the length of the network memory can be adapted to the given time series, while it is fixed for the conventional lag-space net (Weigend et al., 90). In forecasting, however, feedforward architectures remain the most popular structures; only few applications are reported based on the Williams&Zipser approach.


Unsupervised Classification of 3D Objects from 2D Views

Neural Information Processing Systems

The human visual system can recognize various 3D (three-dimensional) objects from their 2D (two-dimensional) retinal images although the images vary significantly as the viewpoint changes. Recent computational models have explored how to learn to recognize 3D objects from their projected views (Poggio & Edelman, 1990). Most existing models are, however, based on supervised learning, i.e., during training the teacher tells which object each view belongs to. The model proposed by Weinshall et al. (1990) also requires a signal that segregates different objects during training. This paper, on the other hand, discusses unsupervised aspects of 3D object recognition where the system discovers categories by itself.


Reinforcement Learning Methods for Continuous-Time Markov Decision Problems

Neural Information Processing Systems

Semi-Markov Decision Problems are continuous time generalizations of discrete time Markov Decision Problems. A number of reinforcement learning algorithms have been developed recently for the solution of Markov Decision Problems, based on the ideas of asynchronous dynamic programming and stochastic approximation. Among these are TD(,x), Q-Iearning, and Real-time Dynamic Programming. After reviewing semi-Markov Decision Problems and Bellman's optimality equation in that context, we propose algorithms similar to those named above, adapted to the solution of semi-Markov Decision Problems. We demonstrate these algorithms by applying them to the problem of determining the optimal control for a simple queueing system. We conclude with a discussion of circumstances under which these algorithms may be usefully applied.


Neural Network Ensembles, Cross Validation, and Active Learning

Neural Information Processing Systems

It is well known that a combination of many different predictors can improve predictions. In the neural networks community "ensembles" of neural networks has been investigated by several authors, see for instance [1, 2, 3]. Most often the networks in the ensemble are trained individually and then their predictions are combined. This combination is usually done by majority (in classification) or by simple averaging (in regression), but one can also use a weighted combination of the networks.


Active Learning with Statistical Models

Neural Information Processing Systems

For many types of learners one can compute the statistically "optimal" way to select data. We review how these techniques have been used with feedforward neural networks [MacKay, 1992; Cohn, 1994]. We then show how the same principles may be used to select data for two alternative, statistically-based learning architectures: mixtures of Gaussians and locally weighted regression. While the techniques for neural networks are expensive and approximate, the techniques for mixtures of Gaussians and locally weighted regression are both efficient and accurate.


Factorial Learning by Clustering Features

Neural Information Processing Systems

We introduce a novel algorithm for factorial learning, motivated by segmentation problems in computational vision, in which the underlying factors correspond to clusters of highly correlated input features. The algorithm derives from a new kind of competitive clustering model, in which the cluster generators compete to explain each feature of the data set and cooperate to explain each input example, rather than competing for examples and cooperating on features, as in traditional clustering algorithms. A natural extension of the algorithm recovers hierarchical models of data generated from multiple unknown categories, each with a different, multiple causal structure. Several simulations demonstrate the power of this approach.


Reinforcement Learning Predicts the Site of Plasticity for Auditory Remapping in the Barn Owl

Neural Information Processing Systems

In young barn owls raised with optical prisms over their eyes, these auditory maps are shifted to stay in register with the visual map, suggesting that the visual input imposes a frame of reference on the auditory maps. However, the optic tectum, the first site of convergence of visual with auditory information, is not the site of plasticity for the shift of the auditory maps; the plasticity occurs instead in the inferior colliculus, which contains an auditory map and projects into the optic tectum. We explored a model of the owl remapping in which a global reinforcement signal whose delivery is controlled by visual foveation. A hebb learning rule gated by reinforcement learned to appropriately adjust auditory maps. In addition, reinforcement learning preferentially adjusted the weights in the inferior colliculus, as in the owl brain, even though the weights were allowed to change throughout the auditory system. This observation raises the possibility that the site of learning does not have to be genetically specified, but could be determined by how the learning procedure interacts with the network architecture.