Goto

Collaborating Authors

 Country



Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter

Neural Information Processing Systems

When we model a higher order functions, such as learning and memory, we face a difficulty of comparing neural activities with hidden variables that depend on the history of sensory and motor signals and the dynamics ofthe network. Here, we propose novel method for estimating hidden variables of a learning agent, such as connection weights from sequences of observable variables. Bayesian estimation is a method to estimate the posterior probability of hidden variables from observable data sequence using a dynamic model of hidden and observable variables.


On the Concentration of Expectation and Approximate Inference in Layered Networks

Neural Information Processing Systems

We present an analysis of concentration-of-expectation phenomena in layered Bayesian networks that use generalized linear models as the local conditional probabilities. This framework encompasses a wide variety of probability distributions, including both discrete and continuous random variables. We utilize ideas from large deviation analysis and the delta method to devise and evaluate a class of approximate inference algorithms forlayered Bayesian networks that have superior asymptotic error bounds and very fast computation time.


The Doubly Balanced Network of Spiking Neurons: A Memory Model with High Capacity

Neural Information Processing Systems

A balanced network leads to contradictory constraints on memory models, as exemplified in previous work on accommodation of synfire chains. Here we show that these constraints can be overcome by introducing a'shadow' inhibitory pattern for each excitatory pattern of the model. This is interpreted as a doublebalance principle,whereby there exists both global balance between average excitatory and inhibitory currents and local balance between the currents carrying coherent activity at any given time frame. This principle can be applied to networks with Hebbian cell assemblies, leading to a high capacity of the associative memory. The number of possible patterns is limited by a combinatorial constraint that turns out to be P 0.06N within the specific model that we employ. This limit is reached by the Hebbian cell assembly network. To the best of our knowledge this is the first time that such high memory capacities are demonstrated in the asynchronous state of models of spiking neurons.


An Iterative Improvement Procedure for Hierarchical Clustering

Neural Information Processing Systems

We describe a procedure which finds a hierarchical clustering by hillclimbing. Thecost function we use is a hierarchical extension of the k-means cost; our local moves are tree restructurings and node reorderings. Weshow these can be accomplished efficiently, by exploiting special properties of squared Euclidean distances and by using techniques from scheduling algorithms.


Computing Gaussian Mixture Models with EM Using Equivalence Constraints

Neural Information Processing Systems

Density estimation with Gaussian Mixture Models is a popular generative techniqueused also for clustering. We develop a framework to incorporate side information in the form of equivalence constraints into the model estimation procedure. Equivalence constraints are defined on pairs of data points, indicating whether the points arise from the same source (positive constraints) or from different sources (negative constraints). Suchconstraints can be gathered automatically in some learning problems, and are a natural form of supervision in others. For the estimation of model parameters we present a closed form EM procedure which handles positive constraints, and a Generalized EM procedure using aMarkov net which handles negative constraints. Using publicly available data sets we demonstrate that such side information can lead to considerable improvement in clustering tasks, and that our algorithm is preferable to two other suggested methods using the same type of side information.


Warped Gaussian Processes

Neural Information Processing Systems

This allows for non-Gaussian processes and non-Gaussian noise. The learning algorithm choosesa nonlinear transformation such that transformed data is well-modelled by a GP. This can be seen as including a preprocessing transformation as an integral part of the probabilistic modelling problem, rather than as an ad-hoc step. We demonstrate on several real regression problems that learning the transformation can lead to significantly better performance than using a regular GP, or a GP with a fixed transformation.


Sparse Greedy Minimax Probability Machine Classification

Neural Information Processing Systems

The Minimax Probability Machine Classification (MPMC) framework [Lanckriet et al., 2002] builds classifiers by minimizing the maximum probability of misclassification, and gives direct estimates of the probabilistic accuracybound ฮฉ. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally addingbasis functions (i.e.


Wormholes Improve Contrastive Divergence

Neural Information Processing Systems

In models that define probabilities via energies, maximum likelihood learning typically involves using Markov Chain Monte Carlo to sample from the model's distribution. If the Markov chain is started at the data distribution, learning often works well even if the chain is only run for a few time steps [3]. But if the data distribution contains modes separated by regions of very low density, brief MCMC will not ensure that different modes have the correct relative energies because it cannot move particles from one mode to another. We show how to improve brief MCMC by allowing long-range moves that are suggested by the data distribution. If the model is approximately correct, these long-range moves have a reasonable acceptance rate.


New Algorithms for Efficient High Dimensional Non-parametric Classification

Neural Information Processing Systems

This paper is about non-approximate acceleration of high dimensional nonparametric operations such as k nearest neighbor classifiers and the prediction phase of Support Vector Machine classifiers. We attempt to exploit the fact that even if we want exact answers to nonparametric queries, we usually do not need to explicitly find the datapoints close to the query, but merely need to ask questions about the properties about that set of datapoints. This offers a small amount of computational leeway, andwe investigate how much that leeway can be exploited. For clarity, this paper concentrates on pure k-NN classification and the prediction phaseof SVMs. We introduce new ball tree algorithms that on real-world datasets give accelerations of 2-fold up to 100-fold compared against highly optimized traditional ball-tree-based k-NN.