Goto

Collaborating Authors

 Uncertainty


Statistically Efficient Estimations Using Cortical Lateral Connections

Neural Information Processing Systems

Coarse codes are widely used throughout the brain to encode sensory and motor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient, i.e., the variance of the estimate is much larger than the smallest possible variance, or biologically implausible, like maximum likelihood. Moreover, these methods attempt to compute a scalar or vector estimate of the encoded variable. Neurons are faced with a similar estimation problem. They must read out the responses of the presynaptic neurons, but, by contrast, they typically encode the variable with a further population code rather than as a scalar. We show how a nonlinear recurrent network can be used to perform these estimation in an optimal way while keeping the estimate in a coarse code format. This work suggests that lateral connections in the cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.


Learning Exact Patterns of Quasi-synchronization among Spiking Neurons from Data on Multi-unit Recordings

Neural Information Processing Systems

This paper develops arguments for a family of temporal log-linear models to represent spatiotemporal correlations among the spiking events in a group of neurons. The models can represent not just pairwise correlations but also correlations of higher order. Methods are discussed for inferring the existence or absence of correlations and estimating their strength. A frequentist and a Bayesian approach to correlation detection are compared.


Contour Organisation with the EM Algorithm

Neural Information Processing Systems

This paper describes how the early visual process of contour organisation canbe realised using the EM algorithm. The underlying computational representation is based on fine spline coverings. According toour EM approach the adjustment of spline parameters draws on an iterative weighted least-squares fitting process. The expectation step of our EM procedure computes the likelihood of the data using a mixture model defined over the set of spline coverings. Thesesplines are limited in their spatial extent using Gaussian windowing functions.


Recursive Algorithms for Approximating Probabilities in Graphical Models

Neural Information Processing Systems

Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Cambridge, MA 02139 Abstract We develop a recursive node-elimination formalism for efficiently approximating large probabilistic networks. No constraints are set on the network topologies. Yet the formalism can be straightforwardly integratedwith exact methods whenever they are/become applicable. The approximations we use are controlled: they maintain consistentlyupper and lower bounds on the desired quantities at all times. We show that Boltzmann machines, sigmoid belief networks, or any combination (i.e., chain graphs) can be handled within the same framework.


The Generalisation Cost of RAMnets

Neural Information Processing Systems

Neural Computing Research Group Aston University Aston Triangle, Birmingham B4 7ET, UK. Abstract Given unlimited computational resources, it is best to use a criterion ofminimal expected generalisation error to select a model and determine its parameters. However, it may be worthwhile to sacrifice somegeneralisation performance for higher learning speed. A method for quantifying sub-optimality is set out here, so that this choice can be made intelligently. Furthermore, the method is applicable to a broad class of models, including the ultra-fast memory-based methods such as RAMnets. This brings the added benefit of providing, for the first time, the means to analyse the generalisation properties of such models in a Bayesian framework . 1 Introduction In order to quantitatively predict the performance of methods such as the ultra-fast RAMnet, which are not trained by minimising a cost function, we develop a Bayesian formalism for estimating the generalisation cost of a wide class of algorithms.


Statistically Efficient Estimations Using Cortical Lateral Connections

Neural Information Processing Systems

Coarse codes are widely used throughout the brain to encode sensory andmotor variables. Methods designed to interpret these codes, such as population vector analysis, are either inefficient, i.e., the variance of the estimate is much larger than the smallest possible variance,or biologically implausible, like maximum likelihood. Moreover, these methods attempt to compute a scalar or vector estimate of the encoded variable. Neurons are faced with a similar estimationproblem. They must read out the responses of the presynaptic neurons, but, by contrast, they typically encode the variable with a further population code rather than as a scalar. We show how a nonlinear recurrent network can be used to perform theseestimation in an optimal way while keeping the estimate in a coarse code format. This work suggests that lateral connections inthe cortex may be involved in cleaning up uncorrelated noise among neurons representing similar variables.


Bayesian Unsupervised Learning of Higher Order Structure

Neural Information Processing Systems

Many real world patterns have a hierarchical underlying structure in which simple features have a higher order structure among themselves. Because these relationships are often statistical in nature, it is natural to view the process of discovering such structures as a statistical inference problem in which a hierarchical model is fit to data. Hierarchical statistical structure can be conveniently represented with Bayesian belief networks (Pearl, 1988; Lauritzen and Spiegelhalter, 1988; Neal, 1992). These 530 M.S. Lewicki and T. 1. Sejnowski models are powerful, because they can capture complex statistical relationships among the data variables, and also mathematically convenient, because they allow efficient computation of the joint probability for any given set of model parameters. The joint probability density of a network of binary states is given by a product of conditional probabilities (1) where W is the weight matrix that parameterizes the model. Note that the probability ofan individual state Si depends only on its parents.


Ordered Classes and Incomplete Examples in Classification

Neural Information Processing Systems

The classes in classification tasks often have a natural ordering, and the training and testing examples are often incomplete. We propose a nonlinear ordinalmodel for classification into ordered classes. Predictive, simulation-based approaches are used to learn from past and classify future incompleteexamples. These techniques are illustrated by making prognoses for patients who have suffered severe head injuries.


An Apobayesian Relative of Winnow

Neural Information Processing Systems

We study a mistake-driven variant of an online Bayesian learning algorithm(similar to one studied by Cesa-Bianchi, Helmbold, and Panizza [CHP96]). This variant only updates its state (learns) on trials in which it makes a mistake. The algorithm makes binary classifications using a linear-threshold classifier and runs in time linear inthe number of attributes seen by the learner. We have been able to show, theoretically and in simulations, that this algorithm performs well under assumptions quite different from those embodied inthe prior of the original Bayesian algorithm. It can handle situations that we do not know how to handle in linear time with Bayesian algorithms. We expect our techniques to be useful in deriving and analyzing other apobayesian algorithms. 1 Introduction We consider two styles of online learning.


Analysis of Temporal-Diffference Learning with Function Approximation

Neural Information Processing Systems

The algorithm weanalyze performs online updating of a parameter vector during a single endless trajectory of an aperiodic irreducible finite state Markov chain. Results include convergence (with probability 1), a characterization of the limit of convergence, and a bound on the resulting approximation error. In addition to establishing new and stronger results than those previously available, our analysis is based on a new line of reasoning that provides new intuition about the dynamics of temporal-difference learning. Furthermore, we discuss the implications of two counterexamples with regards to the Significance of online updating and linearly parameterized function approximators. 1 INTRODUCTION The problem of predicting the expected long-term future cost (or reward) of a stochastic dynamic system manifests itself in both time-series prediction and control. Anexample in time-series prediction is that of estimating the net present value of a corporation, as a discounted sum of its future cash flows, based on the current state of its operations. In control, the ability to predict long-term future cost as a function of state enables the ranking of alternative states in order to guide decision-making. Indeed, such predictions constitute the cost-to-go function that is central to dynamic programming and optimal control (Bertsekas, 1995). Temporal-difference learning, originally proposed by Sutton (1988), is a method for approximating long-term future cost as a function of current state.