Goto

Collaborating Authors

 Europe


Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Neural Information Processing Systems

Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coefficients. Furthermore efficient generalised belief propagation methods between clusters of four nodes enable the Fourier coefficients to be inferred and the missing data to be estimated in near to O(n log n) time, where n is the total of the given and missing data points. This method is compared with a number of common approaches such as setting missing data to zero or to interpolation. It is tested on generated data and for a Fourier analysis of a damaged audio signal.


Wormholes Improve Contrastive Divergence

Neural Information Processing Systems

In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model's distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.


Denoising and Untangling Graphs Using Degree Priors

Neural Information Processing Systems

This paper addresses the problem of untangling hidden graphs from a set of noisy detections of undirected edges. We present a model of the generation of the observed graph that includes degree-based structure priors on the hidden graphs. Exact inference in the model is intractable; we present an efficient approximate inference algorithm to compute edge appearance posteriors. We evaluate our model and algorithm on a biological graph inference problem.


Approximability of Probability Distributions

Neural Information Processing Systems

We consider the question of how well a given distribution can be approximated with probabilistic graphical models. We introduce a new parameter, effective treewidth, that captures the degree of approximability as a tradeoff between the accuracy and the complexity of approximation. We present a simple approach to analyzing achievable tradeoffs that exploits the threshold behavior of monotone graph properties, and provide experimental results that support the approach.


Approximate Expectation Maximization

Neural Information Processing Systems

The E-step boils down to computing probabilities of the hidden variables given the observed variables (evidence) and current set of parameters. The M-step then, given these probabilities, yields a new set of parameters guaranteed to increase the likelihood. In Bayesian networks, that will be the focus of this article, the M-step is usually relatively straightforward. A complication may arise in the E-step, when computing the probability of the hidden variables given the evidence becomes intractable. An often used approach is to replace the exact yet intractable inference in the E step with approximate inference, either through sampling or using a deterministic variational method. The use of a "mean-field" variational method in this context leads to an algorithm known as variational EM and can be given theinterpretation of minimizing a free energy with respect to both a tractable approximate distribution (approximate E-step) and the parameters (M-step) [2]. Loopy belief propagation [3] and variants thereof, such as generalized belief propagation [4] and expectation propagation [5], have become popular alternatives to the "mean-field" variational approaches, often yielding somewhat better approximations. And indeed, they can and have been applied for approximate inference in the E-step of the EM algorithm (see e.g.


Can We Learn to Beat the Best Stock

Neural Information Processing Systems

A novel algorithm for actively trading stocks is presented. While traditional universal algorithms (and technical trading heuristics) attempt to predict winners or trends, our approach relies on predictable statistical relations between all pairs of stocks in the market. Our empirical results on historical markets provide strong evidence that this type of technical trading can "beat the market" and moreover, can beat the best stock in the market. In doing so we utilize a new idea for smoothing critical parameters in the context of expert learning.


Warped Gaussian Processes

Neural Information Processing Systems

This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm chooses a nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.


Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data

Neural Information Processing Systems

In this paper we introduce a new underlying probabilistic model for principal component analysis (PCA). Our formulation interprets PCA as a particular Gaussian process prior on a mapping from a latent space to the observed data-space. We show that if the prior's covariance function constrains the mappings to be linear the model is equivalent to PCA, we then extend the model by considering less restrictive covariance functions which allow nonlinear mappings. This more general Gaussian process latent variable model (GPLVM) is then evaluated as an approach to the visualisation of high dimensional data for three different data-sets. Additionally our nonlinear algorithm can be further kernelised leading to'twin kernel PCA' in which a mapping between feature spaces occurs.


Learning with Local and Global Consistency

Neural Information Processing Systems

We consider the general problem of learning from labeled and unlabeled data, which is often called semi-supervised learning or transductive inference. A principled approach to semi-supervised learning is to design a classifying function which is sufficiently smooth with respect to the intrinsic structure collectively revealed by known labeled and unlabeled points. We present a simple algorithm to obtain such a smooth solution. Our method yields encouraging experimental results on a number of classification problems and demonstrates effective use of unlabeled data.


Non-linear CCA and PCA by Alignment of Local Models

Neural Information Processing Systems

We propose a nonlinear Canonical Correlation Analysis (CCA) method which works by coordinating or aligning mixtures of linear models. In the same way that CCA extends the idea of PCA, our work extends recent methods for nonlinear dimensionality reduction to the case where multiple embeddings of the same underlying low dimensional coordinates are observed, each lying on a different high dimensional manifold. We also show that a special case of our method, when applied to only a single manifold, reduces to the Laplacian Eigenmaps algorithm. As with previous alignment schemes, once the mixture models have been estimated, all of the parameters of our model can be estimated in closed form without local optima in the learning. Experimental results illustrate the viability of the approach as a nonlinear extension of CCA.