Goto

Collaborating Authors

 Asia


Means, Correlations and Bounds

Neural Information Processing Systems

The partition function for a Boltzmann machine can be bounded from above and below. We can use this to bound the means and the correlations. For networks with small weights, the values of these statistics can be restricted to nontrivial regions (i.e. a subset of [-1, 1]). Experimental results show that reasonable bounding occurs for weight sizes where mean field expansions generally give good results. 1 Introduction Over the last decade, bounding techniques have become a popular tool to deal with graphical models that are too complex for exact computation. A nice property of bounds is that they give at least some information you can rely on.


Boosting and Maximum Likelihood for Exponential Models

Neural Information Processing Systems

We derive an equivalence between AdaBoost and the dual of a convex optimization problem, showing that the only difference between minimizing the exponential loss used by AdaBoost and maximum likelihood for exponential models is that the latter requires the model to be normalized to form a conditional probability distribution over labels. In addition to establishing a simple and easily understood connection between the two methods, this framework enables us to derive new regularization procedures for boosting that directly correspond to penalized maximum likelihood. Experiments on UCI datasets support our theoretical analysis and give additional insight into the relationship between boosting and logistic regression.


Novel iteration schemes for the Cluster Variation Method

Neural Information Processing Systems

It has been noted by several authors that Belief Propagation can can also give impressive results for graphs that are not trees [2]. The Cluster Variation Method (CVM), is a method that has been developed in the physics community for approximate inference in the Ising model [3]. The CVM approximates the joint probability distribution by a number of (overlapping) marginal distributions (clusters). The quality of the approximation is determined by the size and number of clusters. When the clusters consist of only two variables, the method is known as the Bethe approximation.


On the Generalization Ability of On-Line Learning Algorithms

Neural Information Processing Systems

In this paper we show that online algorithms for classification and regression can be naturally used to obtain hypotheses with good datadependent tail bounds on their risk. Our results are proven without requiring complicated concentration-of-measure arguments and they hold for arbitrary online learning algorithms. Furthermore, when applied to concrete online algorithms, our results yield tail bounds that in many cases are comparable or better than the best known bounds.


Geometrical Singularities in the Neuromanifold of Multilayer Perceptrons

Neural Information Processing Systems

Singularities are ubiquitous in the parameter space of hierarchical models such as multilayer perceptrons. At singularities, the Fisher information matrix degenerates, and the Cramer-Rao paradigm does no more hold, implying that the classical model selection theory such as AIC and MDL cannot be applied. It is important to study the relation between the generalization error and the training error at singularities. The present paper demonstrates a method of analyzing these errors both for the maximum likelihood estimator and the Bayesian predictive distribution in terms of Gaussian random fields, by using simple models. 1 Introduction A neural network is specified by a number of parameters which are synaptic weights and biases. Learning takes place by modifying these parameters from observed input-output examples.


Correlation Codes in Neuronal Populations

Neural Information Processing Systems

Population codes often rely on the tuning of the mean responses to the stimulus parameters. However, this information can be greatly suppressed by long range correlations. Here we study the efficiency of coding information in the second order statistics of the population responses. We show that the Fisher Information of this system grows linearly with the size of the system. We propose a bilinear readout model for extracting information from correlation codes, and evaluate its performance in discrimination and estimation tasks. It is shown that the main source of information in this system is the stimulus dependence of the variances of the single neuron responses.


Information-Geometric Decomposition in Spike Analysis

Neural Information Processing Systems

We present an information-geometric measure to systematically investigate neuronal firing patterns, taking account not only of the second-order but also of higher-order interactions. We begin with the case of two neurons for illustration and show how to test whether or not any pairwise correlation in one period is significantly different from that in the other period. In order to test such a hypothesis of different firing rates, the correlation term needs to be singled out'orthogonally' to the firing rates, where the null hypothesis might not be of independent firing. This method is also shown to directly associate neural firing with behavior via their mutual information, which is decomposed into two types of information, conveyed by mean firing rate and coincident firing, respectively. Then, we show that these results, using the'orthogonal' decomposition, are naturally extended to the case of three neurons and n neurons in general. 1 Introduction Based on the theory of hierarchical structure and related invariant decomposition of interactions by information geometry [3], the present paper briefly summarizes methods useful for systematically analyzing a population of neural firing [9].


Self-regulation Mechanism of Temporally Asymmetric Hebbian Plasticity

Neural Information Processing Systems

Recent biological experimental findings have shown that the synaptic plasticity depends on the relative timing of the pre-and postsynaptic spikes which determines whether Long Term Potentiation (LTP) occurs or Long Term Depression (LTD) does. The synaptic plasticity has been called "Temporally Asymmetric Hebbian plasticity (TAH)". Many authors have numerically shown that spatiotemporal patterns can be stored in neural networks. However, the mathematical mechanism for storage of the spatiotemporal patterns is still unknown, especially the effects of LTD. In this paper, we employ a simple neural network model and show that interference of LTP and LTD disappears in a sparse coding scheme. On the other hand, it is known that the covariance learning is indispensable for storing sparse patterns. We also show that TAH qualitatively has the same effect as the covariance learning when spatiotemporal patterns are embedded in the network.


Associative memory in realistic neuronal networks

Neural Information Processing Systems

Almost two decades ago, Hopfield [1] showed that networks of highly reduced model neurons can exhibit multiple attracting fixed points, thus providing a substrate for associative memory. It is still not clear, however, whether realistic neuronal networks can support multiple attractors. The main difficulty is that neuronal networks in vivo exhibit a stable background state at low firing rate, typically a few Hz. Embedding attractor is easy; doing so without destabilizing the background is not. Previous work [2, 3] focused on the sparse coding limit, in which a vanishingly small number of neurons are involved in any memory.


3 state neurons for contextual processing

Neural Information Processing Systems

Neurons receive excitatory inputs via both fast AMPA and slow NMDA type receptors. We find that neurons receiving input via NMDA receptors can have two stable membrane states which are input dependent. Action potentials can only be initiated from the higher voltage state. Similar observations have been made in several brain areas which might be explained by our model. The interactions between the two kinds of inputs lead us to suggest that some neurons may operate in 3 states: disabled, enabled and firing. Such enabled, but non-firing modes can be used to introduce context-dependent processing in neural networks. We provide a simple example and discuss possible implications for neuronal processing and response variability.