Country
Learning Factored Representations for Partially Observable Markov Decision Processes
The problem of reinforcement learning in a non-Markov environment is explored using a dynamic Bayesian network, where conditional independence assumptions between random variables are compactly represented by network parameters. The parameters are learned online, and approximations are used to perform inference and to compute the optimal value function. The relative effects of inference and value function approximations on the quality of the final policy are investigated, by learning to solve a moderately difficult driving task. The two value function approximations, linear and quadratic, were found to perform similarly, but the quadratic model was more sensitive to initialization. Both performed below the level of human performance on the task. The dynamic Bayesian network performed comparably to a model using a localist hidden state representation, while requiring exponentially fewer parameters.
Spiking Boltzmann Machines
Hinton, Geoffrey E., Brown, Andrew D.
We first show how to represent sharp posterior probability distributions using real valued coefficients on broadly-tuned basis functions. Then we show how the precise times of spikes can be used to convey the real-valued coefficients on the basis functions quickly and accurately. Finally we describe a simple simulation in which spiking neurons learn to model an image sequence by fitting a dynamic generative model. 1 Population codes and energy landscapes A perceived object is represented in the brain by the activities of many neurons, but there is no general consensus on how the activities of individual neurons combine to represent the multiple properties of an object. We start by focussing on the case of a single object that has multiple instantiation parameters such as position, velocity, size and orientation. We assume that each neuron has an ideal stimulus in the space of instantiation parameters and that its activation rate or probability of activation falls off monotonically in all directions as the actual stimulus departs from this ideal.
Robust Learning of Chaotic Attractors
Bakker, Rembrandt, Schouten, Jaap C., Coppens, Marc-Olivier, Takens, Floris, Giles, C. Lee, Bleek, Cor M. van den
A fundamental problem with the modeling of chaotic time series data is that minimizing short-term prediction errors does not guarantee a match between the reconstructed attractors of model and experiments. We introduce a modeling paradigm that simultaneously learns to short-tenn predict and to locate the outlines of the attractor by a new way of nonlinear principal component analysis. Closed-loop predictions are constrained to stay within these outlines, to prevent divergence from the attractor. Learning is exceptionally fast: parameter estimation for the 1000 sample laser data from the 1991 Santa Fe time series competition took less than a minute on a 166 MHz Pentium PC.
Robust Full Bayesian Methods for Neural Networks
Andrieu, Christophe, Freitas, João F. G. de, Doucet, Arnaud
In particular, Mackay showed that by approximating the distributions of the weights with Gaussians and adopting smoothing priors, it is possible to obtain estimates of the weights and output variances and to automatically set the regularisation coefficients. Neal (1996) cast the net much further by introducing advanced Bayesian simulation methods, specifically the hybrid Monte Carlo method, into the analysis of neural networks [3]. Bayesian sequential Monte Carlo methods have also been shown to provide good training results, especially in time-varying scenarios [4]. More recently, Rios Insua and Muller (1998) and Holmes and Mallick (1998) have addressed the issue of selecting the number of hidden neurons with growing and pruning algorithms from a Bayesian perspective [5,6]. In particular, they apply the reversible jump Markov Chain Monte Carlo (MCMC) algorithm of Green [7] to feed-forward sigmoidal networks and radial basis function (RBF) networks to obtain joint estimates of the number of neurons and weights. We also apply the reversible jump MCMC simulation algorithm to RBF networks so as to compute the joint posterior distribution of the radial basis parameters and the number of basis functions. However, we advance this area of research in two important directions. Firstly, we propose a full hierarchical prior for RBF networks.
Learning from User Feedback in Image Retrieval Systems
Vasconcelos, Nuno, Lippman, Andrew
We formulate the problem of retrieving images from visual databases as a problem of Bayesian inference. This leads to natural and effective solutions for two of the most challenging issues in the design of a retrieval system: providing support for region-based queries without requiring prior image segmentation, and accounting for user-feedback during a retrieval session. We present a new learning algorithm that relies on belief propagation to account for both positive and negative examples of the user's interests.
Optimal Sizes of Dendritic and Axonal Arbors
I consider a topographic projection between two neuronal layers with different densities of neurons. Given the number of output neurons connected to each input neuron (divergence or fan-out) and the number of input neurons synapsing on each output neuron (convergence or fan-in) I determine the widths of axonal and dendritic arbors which minimize the total volume ofaxons and dendrites. My analytical results can be summarized qualitatively in the following rule: neurons of the sparser layer should have arbors wider than those of the denser layer. This agrees with the anatomical data from retinal and cerebellar neurons whose morphology and connectivity are known. The rule may be used to infer connectivity of neurons from their morphology.
Information Factorization in Connectionist Models of Perception
Movellan, Javier R., McClelland, James L.
We examine a psychophysical law that describes the influence of stimulus and context on perception. According to this law choice probability ratios factorize into components independently controlled by stimulus and context. It has been argued that this pattern of results is incompatible with feedback models of perception. In this paper we examine this claim using neural network models defined via stochastic differential equations. We show that the law is related to a condition named channel separability and has little to do with the existence of feedback connections. In essence, channels are separable if they converge into the response units without direct lateral connections to other channels and if their sensors are not directly contaminated by external inputs to the other channels. Implications of the analysis for cognitive and computational neurosicence are discussed.
Gaussian Fields for Approximate Inference in Layered Sigmoid Belief Networks
Local "belief propagation" rules of the sort proposed by Pearl [15] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation" using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to loopy belief propagation. Except for the case of graphs with a single loop, there has been little theoretical understanding of the performance of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly Gaussian random variables.
An Improved Decomposition Algorithm for Regression Support Vector Machines
The Karush-Kuhn-Tucker Theorem is used to derive conditions for determining whether or not a given working set is optimal. These conditions become the algorithm)s termination criteria) as an alternative to Osuna)s criteria (also used by Joachims without modification) which used conditions for individual points. The advantage of the new conditions is that knowledge of the hyperplane)s constant factor b) which in some cases is difficult to compute) is not required. Further investigation of the new termination conditions allows to form the strategy for selecting an optimal working set. The new algorithm is applicable to the pattern recognition SVM) and is provably equivalent to Joachims) algorithm. One can also interpret the new algorithm in the sense of the method of feasible directions. Experimental results presented in the last section demonstrate superior performance of the new method in comparison with traditional training of regression SVM. 2 General Principles of Regression SVM Decomposition The original decomposition algorithm proposed for the pattern recognition SVM in [2] has been extended to the regression SVM in [4]. For the sake of completeness I will repeat the main steps of this extension with the aim of providing terse and streamlined notation to lay the ground for working set selection.