Goto

Collaborating Authors

 Europe


From Mixtures of Mixtures to Adaptive Transform Coding

Neural Information Processing Systems

We establish a principled framework for adaptive transform coding. Transformcoders are often constructed by concatenating an ad hoc choice of transform with suboptimal bit allocation and quantizer design.Instead, we start from a probabilistic latent variable model in the form of a mixture of constrained Gaussian mixtures. From this model we derive a transform coding algorithm, which is a constrained version of the generalized Lloyd algorithm for vector quantizer design. A byproduct of our derivation is the introduction ofa new transform basis, which unlike other transforms (PCA, DCT, etc.) is explicitly optimized for coding. Image compression experiments show adaptive transform coders designed with our algorithm improvecompressed image signal-to-noise ratio up to 3 dB compared to global transform coding and 0.5 to 2 dB compared to other adaptive transform coders. 1 Introduction Compression algorithms for image and video signals often use transform coding as a low-complexity alternative to vector quantization (VQ).


Generalizable Singular Value Decomposition for Ill-posed Datasets

Neural Information Processing Systems

Becausethe training examples in an ill-posed data set do not fully span the signal space the observed training set variances in each basis vector will be too high compared to the average variance ofthe test set projections onto the same basis vectors. On basis of this understanding we introduce the Generalizable Singular ValueDecomposition (GenSVD) as a means to reduce this bias by re-estimation of the singular values obtained in a conventional Singular Value Decomposition, allowing for a generalization performance increaseof a subsequent statistical model. We demonstrate that the algorithm succesfully corrects bias in a data set from a functional PET activation study of the human brain. 1 Ill-posed Data Sets An ill-posed data set has more dimensions in each example than there are examples. Such data sets occur in many fields of research typically in connection with image measurements. The associated statistical problem is that of extracting structure from the observed high-dimensional vectors in the presence of noise. The statistical analysis can be done either supervised (Le.



Algorithms for Non-negative Matrix Factorization

Neural Information Processing Systems

Nonnegative matrix factorization (NMF) has previously been shown to be a useful decomposition for multivariate data. Two different multiplicative algorithmsfor NMF are analyzed. They differ only slightly in the multiplicative factor used in the update rules. One algorithm can be shown to minimize the conventional least squares error while the other minimizes the generalized Kullback-Leibler divergence. The monotonic convergence of both algorithms can be proven using an auxiliary function analogousto that used for proving convergence of the Expectation Maximization algorithm. The algorithms can also be interpreted as diagonally rescaledgradient descent, where the rescaling factor is optimally chosen to ensure convergence.


The Kernel Gibbs Sampler

Neural Information Processing Systems

We present an algorithm that samples the hypothesis space of kernel classifiers.Given a uniform prior over normalised weight vectors and a likelihood based on a model of label noise leads to a piecewise constantposterior that can be sampled by the kernel Gibbs sampler (KGS). The KGS is a Markov Chain Monte Carlo method that chooses a random direction in parameter space and samples from the resulting piecewise constant density along the line chosen. The KGS can be used as an analytical tool for the exploration of Bayesian transduction, Bayes point machines, active learning, and evidence-based model selection on small data sets that are contaminated withlabel noise. For a simple toy example we demonstrate experimentally how a Bayes point machine based on the KGS outperforms anSVM that is incapable of taking into account label noise. 1 Introduction Two great ideas have dominated recent developments in machine learning: the application ofkernel methods and the popularisation of Bayesian inference. Focusing on the task of classification, various connections between the two areas exist: kernels havelong been a part of Bayesian inference in the disguise of covariance nmctions thatcharacterise priors over functions [9].


Sparse Representation for Gaussian Process Models

Neural Information Processing Systems

We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm togetherwith a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental resultson toy examples and large real-world datasets indicate the efficiency of the approach.


Analysis of Bit Error Probability of Direct-Sequence CDMA Multiuser Demodulators

Neural Information Processing Systems

We analyze the bit error probability of multiuser demodulators for directsequence binaryphase-shift-keying (DSIBPSK) CDMA channel with additive gaussian noise. The problem of multiuser demodulation is cast into the finite-temperature decoding problem, and replica analysis is applied toevaluate the performance of the resulting MPM (Marginal Posterior Mode)demodulators, which include the optimal demodulator and the MAP demodulator as special cases. An approximate implementation ofdemodulators is proposed using analog-valued Hopfield model as a naive mean-field approximation to the MPM demodulators, and its performance is also evaluated by the replica analysis. Results of the performance evaluationshows effectiveness of the optimal demodulator and the mean-field demodulator compared with the conventional one, especially inthe cases of small information bit rate and low noise level. 1 Introduction The CDMA (Code-Division-Multiple-Access) technique [1] is important as a fundamental technology of digital communications systems, such as cellular phones. The important applications includerealization of spread-spectrum multipoint-to-point communications systems, in which multiple users share the same communication channel.


Second Order Approximations for Probability Models

Neural Information Processing Systems

In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument itis shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. Fora restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.


Whence Sparseness?

Neural Information Processing Systems

It has been shown that the receptive fields of simple cells in VI can be explained byassuming optimal encoding, provided that an extra constraint of sparseness is added. This finding suggests that there is a reason, independent ofoptimal representation, for sparseness. However this work used an ad hoc model for the noise. Here I show that, if a biologically more plausible noise model, describing neurons as Poisson processes, is used sparseness does not have to be added as a constraint. Thus I conclude thatsparseness is not a feature that evolution has striven for, but is simply the result of the evolutionary pressure towards an optimal representation. 1 Introduction Recently there has been an resurgence of interest in using optimal coding strategies to'explain' the response properties of neuron in the primary sensory areas [1].


Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

Neural Information Processing Systems

Experimental data show that biological synapses behave quite differently from the symbolic synapses in common artificial neural network models. Biological synapses are dynamic, i.e., their "weight" changes on a short time scale by several hundred percent in dependence of the past input to the synapse. In this article we explore the consequences that these synaptic dynamics entail for the computational power of feedforward neural networks. We show that gradient descent suffices to approximate a given (quadratic) filter by a rather small neural system with dynamic synapses. We also compare our network model to artificial neural networks designedfor time series processing. Our numerical results are complemented by theoretical analysis which show that even with just a single hidden layer such networks can approximate a surprisingly large large class of nonlinear filters: all filters that can be characterized by Volterra series. This result is robust with regard to various changes in the model for synaptic dynamics.