Goto

Collaborating Authors

 Learning Graphical Models


Sparse Code Shrinkage: Denoising by Nonlinear Maximum Likelihood Estimation

Neural Information Processing Systems

Sparse coding is a method for finding a representation of data in which each of the components of the representation is only rarely significantly active. Such a representation is closely related to redundancy reductionand independent component analysis, and has some neurophysiological plausibility. In this paper, we show how sparse coding can be used for denoising. Using maximum likelihood estimation of nongaussian variables corrupted by gaussian noise, we show how to apply a shrinkage nonlinearity on the components of sparse coding so as to reduce noise. Furthermore, we show how to choose the optimal sparse coding basis for denoising.


Learning from Dyadic Data

Neural Information Processing Systems

Dyadzc data refers to a domain with two finite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises naturally in many application rangingfrom computational linguistics and information retrieval to preference analysis and computer vision. In this paper, we present a systematic, domain-independent framework of learning fromdyadic data by statistical mixture models. Our approach covers different models with fiat and hierarchical latent class structures. Wepropose an annealed version of the standard EM algorithm for model fitting which is empirically evaluated on a variety of data sets from different domains. 1 Introduction Over the past decade learning from data has become a highly active field of research distributedover many disciplines like pattern recognition, neural computation, statistics,machine learning, and data mining.



Learning Nonlinear Dynamical Systems Using an EM Algorithm

Neural Information Processing Systems

The Expectation-Maximization (EM) algorithm is an iterative procedure formaximum likelihood parameter estimation from data sets with missing or hidden variables [2]. It has been applied to system identification in linear stochastic state-space models, where the state variables are hidden from the observer and both the state and the parameters of the model have to be estimated simultaneously [9].We present a generalization of the EM algorithm for parameter estimation in nonlinear dynamical systems. The "expectation" stepmakes use of Extended Kalman Smoothing to estimate the state, while the "maximization" step re-estimates the parameters usingthese uncertain state estimates. In general, the nonlinear maximization step is difficult because it requires integrating out the uncertainty in the states. However, if Gaussian radial basis function (RBF)approximators are used to model the nonlinearities, the integrals become tractable and the maximization step can be solved via systems of linear equations.



Approximate Learning of Dynamic Models

Neural Information Processing Systems

Inference is a key component in learning probabilistic models from partially observabledata. When learning temporal models, each of the many inference phases requires a traversal over an entire long data sequence; furthermore,the data structures manipulated are exponentially large, making this process computationally expensive. In [2], we describe an approximate inference algorithm for monitoring stochastic processes, and prove bounds on its approximation error. In this paper, we apply this algorithm as an approximate forward propagation step in an EM algorithm for learning temporal Bayesian networks. We provide a related approximation forthe backward step, and prove error bounds for the combined algorithm.


Learning Multi-Class Dynamics

Neural Information Processing Systems

Yule-Walker) are available for learning Auto-Regressive process models of simple, directly observable, dynamical processes.When sensor noise means that dynamics are observed only approximately, learning can still been achieved via Expectation-Maximisation (EM) together with Kalman Filtering. However, this does not handle more complex dynamics, involving multiple classes of motion.


Bayesian PCA

Neural Information Processing Systems

The technique of principal component analysis (PCA) has recently been expressed as the maximum likelihood solution for a generative latent variable model. In this paper we use this probabilistic reformulation as the basis for a Bayesian treatment of PCA. Our key result is that effective dimensionalityof the latent space (equivalent to the number of retained principal components) can be determined automatically as part of the Bayesian inference procedure. An important application of this framework is to mixtures of probabilistic PCA models, in which each component can determine its own effective complexity. 1 Introduction Principal component analysis (PCA) is a widely used technique for data analysis. Recently Tipping and Bishop (1997b) showed that a specific form of generative latent variable model has the property that its maximum likelihood solution extracts the principal subspace of the observed data set.


Mean Field Methods for Classification with Gaussian Processes

Neural Information Processing Systems

We discuss the application of TAP mean field methods known from the Statistical Mechanics of disordered systems to Bayesian classification modelswith Gaussian processes. In contrast to previous approaches, noknowledge about the distribution of inputs is needed. Simulation results for the Sonar data set are given. They have been recently introduced into the Neural Computation community (Neal 1996, Williams & Rasmussen 1996, Mackay 1997). If we assume fields with zero prior mean, the statistics of h is entirely defined by the second order correlations C(s, S') E[h(s)h(S')], where E denotes expectations 310 MOpper and 0. Winther with respect to the prior. Interesting examples are C(s, s') (1) C(s, s') (2) The choice (1) can be motivated as a limit of a two-layered neural network with infinitely many hidden units with factorizable input-hidden weight priors (Williams 1997).


Synergy and Redundancy among Brain Cells of Behaving Monkeys

Neural Information Processing Systems

While it is unlikely that complete information from any macroscopic neural tissue will ever be available, some interesting insight can be obtained from simultaneously recorded cells in the cortex of behaving animals. The question we address in this study is the level of synergy, or the level of cooperation, among brain cells, as determined by the information they provide about the observed behavior of the animal.