Goto

Collaborating Authors

 Country


A Winner-Take-All Circuit with Controllable Soft Max Property

Neural Information Processing Systems

I describe a silicon network consisting of a group of excitatory neurons and a global inhibitory neuron. The output of the inhibitory neuron is normalized with respect to the input strengths.



The Parallel Problems Server: an Interactive Tool for Large Scale Machine Learning

Neural Information Processing Systems

Imagine that you wish to classify data consisting of tens of thousands of examples residing in a twenty thousand dimensional space. How can one apply standard machine learning algorithms? We describe the Parallel Problems Server (PPServer) and MATLAB*P. In tandem they allow users of networked computers to work transparently on large data sets from within Matlab. This work is motivated by the desire to bring the many benefits of scientific computing algorithms and computational power to machine learning researchers. We demonstrate the usefulness of the system on a number of tasks. For example, we perform independent components analysis on very large text corpora consisting of tens of thousands of documents, making minimal changes to the original Bell and Sejnowski Matlab source (Bell and Sejnowski, 1995). Applying ML techniques to data previously beyond their reach leads to interesting analyses of both data and algorithms.


Manifold Stochastic Dynamics for Bayesian Learning

Neural Information Processing Systems

We propose a new Markov Chain Monte Carlo algorithm which is a generalization of the stochastic dynamics method. The algorithm performs exploration of the state space using its intrinsic geometric structure, facilitating efficient sampling of complex distributions. Applied to Bayesian learning in neural networks, our algorithm was found to perform at least as well as the best state-of-the-art method while consuming considerably less time. 1 Introduction


Data Visualization and Feature Selection: New Algorithms for Nongaussian Data

Neural Information Processing Systems

Visualization of input data and feature selection are intimately related. A good feature selection algorithm can identify meaningful coordinate projections for low dimensional data visualization. Conversely, a good visualization technique can suggest meaningful features to include in a model. Input variable selection is the most important step in the model selection process. Given a target variable, a set of input variables can be selected as explanatory variables by some prior knowledge.


A MCMC Approach to Hierarchical Mixture Modelling

Neural Information Processing Systems

There are many hierarchical clustering algorithms available, but these lack a firm statistical basis. Here we set up a hierarchical probabilistic mixture model, where data is generated in a hierarchical tree-structured manner. Markov chain Monte Carlo (MCMC) methods are demonstrated which can be used to sample from the posterior distribution over trees containing variable numbers of hidden units.


Correctness of Belief Propagation in Gaussian Graphical Models of Arbitrary Topology

Neural Information Processing Systems

Local "belief propagation" rules of the sort proposed by Pearl [15] are guaranteed to converge to the correct posterior probabilities in singly connected graphical models. Recently, a number of researchers have empirically demonstrated good performance of "loopy belief propagation" using these same rules on graphs with loops. Perhaps the most dramatic instance is the near Shannon-limit performance of "Turbo codes", whose decoding algorithm is equivalent to loopy belief propagation. Except for the case of graphs with a single loop, there has been little theoretical understanding of the performance of loopy propagation. Here we analyze belief propagation in networks with arbitrary topologies when the nodes in the graph describe jointly Gaussian random variables.


Dual Estimation and the Unscented Transformation

Neural Information Processing Systems

Dual estimation refers to the problem of simultaneously estimating the state of a dynamic system and the model which gives rise to the dynamics. Algorithms include expectation-maximization (EM), dual Kalman filtering, and joint Kalman methods. These methods have recently been explored in the context of nonlinear modeling, where a neural network is used as the functional form of the unknown model. Typically, an extended Kalman filter (EKF) or smoother is used for the part of the algorithm that estimates the clean state given the current estimated model. An EKF may also be used to estimate the weights of the network. This paper points out the flaws in using the EKF, and proposes an improvement based on a new approach called the unscented transformation (UT) [3]. A substantial performance gain is achieved with the same order of computational complexity as that of the standard EKF. The approach is illustrated on several dual estimation methods.


Support Vector Method for Multivariate Density Estimation

Neural Information Processing Systems

A new method for multivariate density estimation is developed based on the Support Vector Method (SVM) solution of inverse ill-posed problems. The solution has the form of a mixture of densities. This method with Gaussian kernels compared favorably to both Parzen's method and the Gaussian Mixture Model method. For synthetic data we achieve more accurate estimates for densities of 2, 6, 12, and 40 dimensions. 1 Introduction The problem of multivariate density estimation is important for many applications, in particular, for speech recognition [1] [7]. When the unknown density belongs to a parametric set satisfying certain conditions one can estimate it using the maximum likelihood (ML) method. Often these conditions are too restrictive. Therefore, nonparametric methods were proposed. The most popular of these, Parzen's method [5], uses the following estimate given data


The Relevance Vector Machine

Neural Information Processing Systems

The support vector machine (SVM) is a state-of-the-art technique for regression and classification, combining excellent generalisation properties with a sparse kernel representation. However, it does suffer from a number of disadvantages, notably the absence of probabilistic outputs, the requirement to estimate a tradeoff parameter and the need to utilise'Mercer' kernel functions. In this paper we introduce the Relevance Vector Machine (RVM), a Bayesian treatment of a generalised linear model of identical functional form to the SVM. The RVM suffers from none of the above disadvantages, and examples demonstrate that for comparable generalisation performance, the RVM requires dramatically fewer kernel functions.