Learning Graphical Models
Learning to Parse Images
Hinton, Geoffrey E., Ghahramani, Zoubin, Teh, Yee Whye
We describe a class of probabilistic models that we call credibility networks. Using parse trees as internal representations of images, credibility networks are able to perform segmentation and recognition simultaneously,removing the need for ad hoc segmentation heuristics. Promising results in the problem of segmenting handwritten digitswere obtained.
Variational Inference for Bayesian Mixtures of Factor Analysers
Ghahramani, Zoubin, Beal, Matthew J.
Zoubin Ghahramani and Matthew J. Beal Gatsby Computational Neuroscience Unit University College London 17 Queen Square, London WC1N 3AR, England {zoubin,m.beal}Ggatsby.ucl.ac.uk Abstract We present an algorithm that infers the model structure of a mixture offactor analysers using an efficient and deterministic variational approximationto full Bayesian integration over model parameters. Thisprocedure can automatically determine the optimal number of components and the local dimensionality of each component (Le. the number of factors in each factor analyser). Alternatively it can be used to infer posterior distributions over number of components and dimensionalities. Since all parameters are integrated out the method is not prone to overfitting. Using a stochastic procedure for adding components it is possible to perform thevariational optimisation incrementally and to avoid local maxima.
The Nonnegative Boltzmann Machine
Downs, Oliver B., MacKay, David J. C., Lee, Daniel D.
The nonnegative Boltzmann machine (NNBM) is a recurrent neural network modelthat can describe multimodal nonnegative data. Application ofmaximum likelihood estimation to this model gives a learning rule that is analogous to the binary Boltzmann machine. We examine the utility of the mean field approximation for the NNBM, and describe how Monte Carlo sampling techniques can be used to learn its parameters. Reflective slicesampling is particularly well-suited for this distribution, and can efficiently be implemented to sample the distribution. We illustrate learning of the NNBM on a transiationally invariant distribution, as well as on a generative model for images of human faces. Introduction The multivariate Gaussian is the most elementary distribution used to model generic data.
Reconstruction of Sequential Data with Probabilistic Models and Continuity Constraints
We consider the problem of reconstructing a temporal discrete sequence of multidimensional real vectors when part of the data is missing, under the assumption that the sequence was generated by a continuous process. Aparticular case of this problem is multivariate regression, which is very difficult when the underlying mapping is one-to-many. We propose analgorithm based on a joint probability model of the variables of interest, implemented using a nonlinear latent variable model. Each point in the sequence is potentially reconstructed as any of the modes of the conditional distribution of the missing variables given the present variables (computed using an exhaustive mode search in a Gaussian mixture). Modeselection is determined by a dynamic programming search that minimises a geometric measure of the reconstructed sequence, derived fromcontinuity constraints. We illustrate the algorithm with a toy example and apply it to a real-world inverse problem, the acoustic-toarticulatory mapping.The results show that the algorithm outperforms conditional mean imputation and multilayer perceptrons. 1 Definition of the problem
Robust Neural Network Regression for Offline and Online Learning
Briegel, Thomas, Tresp, Volker
Although one can derive the Gaussian noise assumption based on a maximum entropy approach, the main reason for this assumption is practicability: underthe Gaussian noise assumption the maximum likelihood parameter estimate can simply be found by minimization of the squared error. Despite its common use it is far from clear that the Gaussian noise assumption is a good choice for many practical problems. Areasonable approach therefore would be a noise distribution which contains the Gaussian as a special case but which has a tunable parameter that allows for more flexible distributions.
Independent Factor Analysis with Temporally Structured Sources
We present a new technique for time series analysis based on dynamic probabilisticnetworks. In this approach, the observed data are modeled in terms of unobserved, mutually independent factors, as in the recently introduced technique of Independent Factor Analysis (IFA).However, unlike in IFA, the factors are not Li.d.; each factor has its own temporal statistical characteristics. We derive a family of EM algorithms that learn the structure of the underlying factors and their relation to the data. These algorithms perform source separation and noise reduction in an integrated manner, and demonstrate superior performance compared to IFA. 1 Introduction The technique of independent factor analysis (IFA) introduced in [1] provides a tool for modeling L'-dim data in terms of L unobserved factors. These factors are mutually independent and combine linearly with added noise to produce the observed data.
Robust Full Bayesian Methods for Neural Networks
Andrieu, Christophe, Freitas, João F. G. de, Doucet, Arnaud
In particular, Mackay showed that by approximating the distributions of the weights with Gaussians and adopting smoothing priors, it is possible to obtain estimates of the weights and output variances and to automatically set the regularisation coefficients.Neal (1996) cast the net much further by introducing advanced Bayesian simulation methods, specifically the hybrid Monte Carlo method, into the analysis of neural networks [3]. Bayesian sequential Monte Carlo methods have also been shown to provide good training results, especially in time-varying scenarios [4]. More recently, Rios Insua and Muller (1998) and Holmes and Mallick (1998) have addressed the issue of selecting the number of hidden neurons with growing and pruning algorithms from a Bayesian perspective [5,6]. In particular, they apply the reversible jump Markov Chain Monte Carlo (MCMC) algorithm of Green [7] to feed-forward sigmoidal networks and radial basis function (RBF) networks to obtain joint estimates of the number of neurons and weights. We also apply the reversible jump MCMC simulation algorithm to RBF networks so as to compute the joint posterior distribution of the radial basis parameters and the number of basis functions. However, we advance this area of research in two important directions.Firstly, we propose a full hierarchical prior for RBF networks.
Efficient Approaches to Gaussian Process Classification
Csató, Lehel, Fokoué, Ernest, Opper, Manfred, Schottky, Bernhard, Winther, Ole
The first two methods are related to mean field ideas known in Statistical Physics. The third approach is based on Bayesian online approach which was motivated by recent results in the Statistical Mechanics of Neural Networks. We present simulation results showing: 1. that the mean field Bayesian evidence may be used for hyperparameter tuning and 2. that the online approach may achieve a low training error fast. 1 Introduction Gaussian processes provide promising nonparametric Bayesian approaches to regression andclassification [2, 1].
Population Decoding Based on an Unfaithful Model
Wu, Si, Nakahara, Hiroyuki, Murata, Noboru, Amari, Shun-ichi
We study a population decoding paradigm in which the maximum likelihood inferenceis based on an unfaithful decoding model (UMLI). This is usually the case for neural population decoding because the encoding process of the brain is not exactly known, or because a simplified decoding modelis preferred for saving computational cost. We consider an unfaithful decoding model which neglects the pairwise correlation between neuronal activities, and prove that UMLI is asymptotically efficient whenthe neuronal correlation is uniform or of limited-range. The performance of UMLI is compared with that of the maximum likelihood inference based on a faithful model and that of the center of mass decoding method.It turns out that UMLI has advantages of decreasing the computational complexity remarkablely and maintaining a high-level decoding accuracy at the same time. The effect of correlation on the decoding accuracy is also discussed.
Spiking Boltzmann Machines
Hinton, Geoffrey E., Brown, Andrew D.
We first show how to represent sharp posterior probability distributions usingreal valued coefficients on broadly-tuned basis functions. Then we show how the precise times of spikes can be used to convey thereal-valued coefficients on the basis functions quickly and accurately. Finally we describe a simple simulation in which spiking neuronslearn to model an image sequence by fitting a dynamic generative model. 1 Population codes and energy landscapes A perceived object is represented in the brain by the activities of many neurons, but there is no general consensus on how the activities of individual neurons combine to represent the multiple properties of an object. We start by focussing on the case of a single object that has multiple instantiation parameters such as position, velocity, size and orientation. We assume that each neuron has an ideal stimulus in the space of instantiation parameters and that its activation rate or probability of activation falls off monotonically in all directions as the actual stimulus departs from this ideal.