Goto

Collaborating Authors

 Country


Bayesian Modelling of fMRI lime Series

Neural Information Processing Systems

We present a Hidden Markov Model (HMM) for inferring the hidden psychological state (or neural activity) during single trial tMRI activation experimentswith blocked task paradigms. Inference is based on Bayesian methodology, using a combination of analytical and a variety of Markov Chain Monte Carlo (MCMC) sampling techniques. The advantage ofthis method is that detection of short time learning effects between repeated trials is possible since inference is based only on single trial experiments.


A Neuromorphic VLSI System for Modeling the Neural Control of Axial Locomotion

Neural Information Processing Systems

We have developed and tested an analog/digital VLSI system that models thecoordination of biological segmental oscillators underlying axial locomotion in animals such as leeches and lampreys. In its current form the system consists of a chain of twelve pattern generating circuits that are capable of arbitrary contralateral inhibitory synaptic coupling. Each pattern generating circuit is implemented with two independent silicon Morris-Lecar neurons with a total of 32 programmable (floating-gate based) inhibitory synapses, and an asynchronous address-event interconnection elementthat provides synaptic connectivity and implements axonal delay. We describe and analyze the data from a set of experiments exploringthe system behavior in terms of synaptic coupling.



Predictive App roaches for Choosing Hyperparameters in Gaussian Processes

Neural Information Processing Systems

Gaussian Processes are powerful regression models specified by parametrized mean and covariance functions. Standard approaches to estimate these parameters (known by the name Hyperparameters) areMaximum Likelihood (ML) and Maximum APosterior (MAP) approaches. In this paper, we propose and investigate predictive approaches,namely, maximization of Geisser's Surrogate Predictive Probability (GPP) and minimization of mean square error withrespect to GPP (referred to as Geisser's Predictive mean square Error (GPE)) to estimate the hyperparameters. We also derive results for the standard Cross-Validation (CV) error and make a comparison. These approaches are tested on a number of problems and experimental results show that these approaches are strongly competitive to existing approaches. 1 Introduction Gaussian Processes (GPs) are powerful regression models that have gained popularity recently,though they have appeared in different forms in the literature for years.


Training Data Selection for Optimal Generalization in Trigonometric Polynomial Networks

Neural Information Processing Systems

In this paper, we consider the problem of active learning in trigonometric polynomialnetworks and give a necessary and sufficient condition of sample points to provide the optimal generalization capability. By analyzing thecondition from the functional analytic point of view, we clarify the mechanism of achieving the optimal generalization capability. We also show that a set of training examples satisfying the condition does not only provide the optimal generalization but also reduces the computational complexityand memory required for the calculation of learning results. Finally, examples of sample points satisfying the condition are given and computer simulations are performed to demonstrate the effectiveness ofthe proposed active learning method. 1 Introduction Supervised learning is obtaining an underlying rule from training examples, and can be formulated as a function approximation problem. If sample points are actively designed, then learning can be performed more efficiently. In this paper, we discuss the problem of designing sample points, referred to as active learning, for optimal generalization. Active learning is classified into two categories depending on the optimality. One is global optimal, where a set of all training examples is optimal (e.g.


Greedy Importance Sampling

Neural Information Processing Systems

I present a simple variation of importance sampling that explicitly searches forimportant regions in the target distribution. I prove that the technique yieldsunbiased estimates, and show empirically it can reduce the variance of standard Monte Carlo estimators. This is achieved by concentrating samplesin more significant regions of the sample space. 1 Introduction It is well known that general inference and learning with graphical models is computationally hard[1] and it is therefore necessary to consider restricted architectures [13], or approximate algorithms to perform these tasks [3, 7]. Among the most convenient and successful techniques are stochastic methods which are guaranteed to converge to a correct solution in the limit oflarge samples [10, 11, 12, 15]. These methods can be easily applied to complex inference problems that overwhelm deterministic approaches.


The Infinite Gaussian Mixture Model

Neural Information Processing Systems

In a Bayesian mixture model it is not necessary a priori to limit the number ofcomponents to be finite. In this paper an infinite Gaussian mixture model is presented which neatly sidesteps the difficult problem of finding the"right" number of mixture components. Inference in the model is done using an efficient parameter-free Markov Chain that relies entirely on Gibbs sampling.


A Multi-class Linear Learning Algorithm Related to Winnow

Neural Information Processing Systems

Committee is an algorithm forcombining the predictions of a set of sub-experts in the online mistake-bounded model oflearning. A sub-expert is a special type of attribute that predicts with a distribution over a finite number of classes. Committee learns a linear function of sub-experts and uses this function to make class predictions. We provide bounds for Committee that show it performs well when the target can be represented by a few relevant sub-experts. We also show how Committee can be used to solve more traditional problems composed of attributes. This leads to a natural extension thatlearns on multi-class problems that contain both traditional attributes and sub-experts.


The Relaxed Online Maximum Margin Algorithm

Neural Information Processing Systems

We describe a new incremental algorithm for training linear threshold functions:the Relaxed Online Maximum Margin Algorithm, or ROMMA. ROMMA can be viewed as an approximation to the algorithm that repeatedly chooses the hyperplane that classifies previously seen examples correctlywith the maximum margin. It is known that such a maximum-margin hypothesis can be computed by minimizing the length of the weight vector subject to a number of linear constraints. ROMMA works by maintaining a relatively simple relaxation of these constraints that can be efficiently updated. We prove a mistake bound for ROMMA that is the same as that proved for the perceptron algorithm. Our analysis implies that the more computationally intensive maximum-margin algorithm alsosatisfies this mistake bound; this is the first worst-case performance guaranteefor this algorithm. We describe some experiments using ROMMA and a variant that updates its hypothesis more aggressively as batch algorithms to recognize handwritten digits. The computational complexity and simplicity of these algorithms is similar to that of perceptron algorithm,but their generalization is much better. We describe a sense in which the performance of ROMMA converges to that of SVM in the limit if bias isn't considered.


Inference for the Generalization Error

Neural Information Processing Systems

In order to to compare learning algorithms, experimental results reported in the machine learning litterature often use statistical tests of significance. Unfortunately,most of these tests do not take into account the variability due to the choice of training set. We perform a theoretical investigation of the variance of the cross-validation estimate of the generalization errorthat takes into account the variability due to the choice of training sets. This allows us to propose two new ways to estimate this variance. We show, via simulations, that these new statistics perform well relative to the statistics considered by Dietterich (Dietterich, 1998). 1 Introduction When applying a learning algorithm (or comparing several algorithms), one is typically interested in estimating its generalization error. Its point estimation is rather trivial through cross-validation. Providing a variance estimate of that estimation, so that hypothesis testing and/orconfidence intervals are possible, is more difficult, especially, as pointed out in (Hinton et aI., 1995), if one wants to take into account the variability due to the choice of the training sets (Breiman, 1996). A notable effort in that direction is Dietterich's work (Dietterich, 1998).Careful investigation of the variance to be estimated allows us to provide new variance estimates, which tum out to perform well. Let us first layout the framework in which we shall work.