Country
A neural network implementing optimal state estimation based on dynamic spike train decoding
Bobrowski, Omer, Meir, Ron, Shoham, Shy, Eldar, Yonina
It is becoming increasingly evident that organisms acting in uncertain dynamical environments often employ exact or approximate Bayesian statistical calculations in order to continuously estimate the environmental state, integrate information from multiple sensory modalities, form predictions and choose actions. What is less clear is how these putative computations are implemented by cortical neural networks. An additional level of complexity is introduced because these networks observe the world through spike trains received from primary sensory afferents, rather than directly. A recent line of research has described mechanisms by which such computations can be implemented using a network of neurons whose activity directly represents a probability distribution across the possible "world states". Much of this work, however, uses various approximations, which severely restrict the domain of applicability of these implementations. Here we make use of rigorous mathematical results from the theory of continuous time point process filtering, and show how optimal real-time state estimation and prediction may be implemented in a general setting using linear neural networks. We demonstrate the applicability of the approach with several examples, and relate the required network properties to the statistical nature of the environment, thereby quantifying the compatibility of a given network with its environment.
Invariant Common Spatial Patterns: Alleviating Nonstationarities in Brain-Computer Interfacing
Blankertz, Benjamin, Kawanabe, Motoaki, Tomioka, Ryota, Hohlefeld, Friederike, Müller, Klaus-Robert, Nikulin, Vadim V.
Brain-Computer Interfaces can suffer from a large variance of the subject conditions within and across sessions. For example vigilance fluctuations in the individual, variable task involvement, workload etc. alter the characteristics of EEG signals and thus challenge a stable BCI operation. In the present work we aim to define features based on a variant of the common spatial patterns (CSP) algorithm that are constructed invariant with respect to such nonstationarities. We enforce invariance properties by adding terms to the denominator of a Rayleigh coefficient representation of CSP such as disturbance covariance matrices from fluctuations in visual processing. In this manner physiological prior knowledge can be used to shape the classification engine for BCI. As a proof of concept we present a BCI classifier that is robust to changes in the level of parietal α -activity. In other words, the EEG decoding still works when there are lapses in vigilance.
Incremental Natural Actor-Critic Algorithms
Bhatnagar, Shalabh, Ghavamzadeh, Mohammad, Lee, Mark, Sutton, Richard S.
We present four new reinforcement learning algorithms based on actor-critic and natural-gradient ideas, and provide their convergence proofs. Actor-critic reinforcement learning methods are online approximations to policy iteration in which the value-function parameters are estimated using temporal difference learning and the policy parameters are updated by stochastic gradient descent. Methods based on policy gradients in this way are of special interest because of their compatibility with function approximation methods, which are needed to handle large or infinite state spaces. The use of temporal difference learning in this way is of interest because in many applications it dramatically reduces the variance of the gradient estimates. The use of the natural gradient is of interest because it can produce better conditioned parameterizations and has been shown to further reduce variance in some cases. Our results extend prior two-timescale convergence results for actor-critic methods by Konda and Tsitsiklis by using temporal difference learning in the actor and by incorporating natural gradients, and they extend prior empirical studies of natural actor-critic methods by Peters, Vijayakumar and Schaal by providing the first convergence proofs and the first fully incremental algorithms.
Near-Maximum Entropy Models for Binary Neural Representations of Natural Images
Bethge, Matthias, Berens, Philipp
Maximum entropy analysis of binary variables provides an elegant way for studying the role of pairwise correlations in neural populations. Unfortunately, these approaches suffer from their poor scalability to high dimensions. In sensory coding, however, high-dimensional data is ubiquitous. Here, we introduce a new approach using a near-maximum entropy model, that makes this type of analysis feasible for very high-dimensional data--the model parameters can be derived in closed form and sampling is easy. Therefore, our NearMaxEnt approach can serve as a tool for testing predictions from a pairwise maximum entropy model not only for low-dimensional marginals, but also for high dimensional measurements of more than thousand units. We demonstrate its usefulness by studying natural images with dichotomized pixel intensities. Our results indicate that the statistics of such higher-dimensional measurements exhibit additional structure that are not predicted by pairwise correlations, despite the fact that pairwise correlations explain the lower-dimensional marginal statistics surprisingly well up to the limit of dimensionality where estimation of the full joint distribution is feasible.
Comparing Bayesian models for multisensory cue combination without mandatory integration
Beierholm, Ulrik, Shams, Ladan, Ma, Wei J., Koerding, Konrad
Bayesian models of multisensory perception traditionally address the problem of estimating an underlying variable that is assumed to be the cause of the two sensory signals. The brain, however, has to solve a more general problem: it also has to establish which signals come from the same source and should be integrated, and which ones do not and should be segregated. In the last couple of years, a few models have been proposed to solve this problem in a Bayesian fashion. One of these has the strength that it formalizes the causal structure of sensory signals. We first compare these models on a formal level. Furthermore, we conduct a psychophysics experiment to test human performance in an auditory-visual spatial localization task in which integration is not mandatory. We find that the causal Bayesian inference model accounts for the data better than other models.
Adaptive Online Gradient Descent
Hazan, Elad, Rakhlin, Alexander, Bartlett, Peter L.
We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions and of Hazan et al for strongly convex functions, achieving intermediate rates between T and log T. Furthermore, we show strong optimality of the algorithm. Finally, we provide an extension of our results to general norms.
A Spectral Regularization Framework for Multi-Task Structure Learning
Argyriou, Andreas, Pontil, Massimiliano, Ying, Yiming, Micchelli, Charles A.
Learning the common structure shared by a set of supervised tasks is an important practical and theoretical problem. Knowledge of this structure may lead to better generalization performance on the tasks and may also facilitate learning new tasks. We propose a framework for solving this problem, which is based on regularization with spectral functions of matrices. This class of regularization problems exhibits appealing computational properties and can be optimized efficiently by an alternating minimization algorithm. In addition, we provide a necessary and sufficient condition for convexity of the regularizer.
Inferring Elapsed Time from Stochastic Neural Processes
Ahrens, Misha, Sahani, Maneesh
Many perceptual processes and neural computations, such as speech recognition, motor control and learning, depend on the ability to measure and mark the passage of time. However, the processes that make such temporal judgements possible are unknown. A number of different hypothetical mechanisms have been advanced, all of which depend on the known, temporally predictable evolution of a neural or psychological state, possibly through oscillations or the gradual decay of a memory trace. Alternatively, judgements of elapsed time might be based on observations of temporally structured, but stochastic processes. Such processes need not be specific to the sense of time; typical neural and sensory processes contain at least some statistical structure across a range of time scales. Here, we investigate the statistical properties of an estimator of elapsed time which is based on a simple family of stochastic process.
Discriminative K-means for Clustering
Ye, Jieping, Zhao, Zheng, Wu, Mingrui
We present a theoretical study on the discriminative clustering framework, recently proposed for simultaneous subspace selection via linear discriminant analysis (LDA) and clustering. Empirical results have shown its favorable performance in comparison with several other popular clustering algorithms. However, the inherent relationship between subspace selection and clustering in this framework is not well understood, due to the iterative nature of the algorithm. We show in this paper that this iterative subspace selection and clustering is equivalent to kernel K-means with a specific kernel Gram matrix. This provides significant and new insights into the nature of this subspace selection procedure. Based on this equivalence relationship, we propose the Discriminative K-means (DisKmeans) algorithm for simultaneous LDA subspace selection and clustering, as well as an automatic parameter estimation procedure. We also present the nonlinear extension of DisKmeans using kernels. We show that the learning of the kernel matrix over a convex set of pre-specified kernel matrices can be incorporated into the clustering formulation. The connection between DisKmeans and several other clustering algorithms is also analyzed. The presented theories and algorithms are evaluated through experiments on a collection of benchmark data sets.
Gaussian Process Models for Link Analysis and Transfer Learning
In this paper we develop a Gaussian process (GP) framework to model a collection of reciprocal random variables defined on the \emph{edges} of a network. We show how to construct GP priors, i.e.,~covariance functions, on the edges of directed, undirected, and bipartite graphs. The model suggests an intimate connection between \emph{link prediction} and \emph{transfer learning}, which were traditionally considered two separate research topics. Though a straightforward GP inference has a very high complexity, we develop an efficient learning algorithm that can handle a large number of observations. The experimental results on several real-world data sets verify superior learning capacity.