Goto

Collaborating Authors

 Country


Maximising Sensitivity in a Spiking Network

Neural Information Processing Systems

We use unsupervised probabilistic machine learning ideas to try to explain thekinds of learning observed in real neurons, the goal being to connect abstract principles of self-organisation to known biophysical processes.For example, we would like to explain Spike Timing-Dependent Plasticity (see [5,6] and Figure 3A), in terms of information theory. Starting out, we explore the optimisation of a network sensitivity measurerelated to maximising the mutual information between input spike timings and output spike timings. Our derivations are analogous to those in ICA, except that the sensitivity of output timings to input timings ismaximised, rather than the sensitivity of output'firing rates' to inputs. ICA and related approaches have been successful in explaining the learning of many properties of early visual receptive fields in rate coding models,and we are hoping for similar gains in understanding of spike coding in networks, and how this is supported, in principled probabilistic ways, by cellular biophysical processes. For now, in our initial simulations, weshow that our derived rule can learn synaptic weights which can unmix, or demultiplex, mixed spike trains. That is, it can recover independent pointprocesses embedded in distributed correlated input spike trains, using an adaptive single-layer feedforward spiking network.


Linear Multilayer Independent Component Analysis for Large Natural Scenes

Neural Information Processing Systems

In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent)signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observedsignals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient andeffective in large-size natural image processing.


Nearly Tight Bounds for the Continuum-Armed Bandit Problem

Neural Information Processing Systems

In the multi-armed bandit problem, an online algorithm must choose from a set of strategies in a sequence of n trials so as to minimize the total cost of the chosen strategies. While nearly tight upper and lower bounds are known in the case when the strategy set is finite, much less is known when there is an infinite strategy set.


Learning first-order Markov models for control

Neural Information Processing Systems

First-order Markov models have been successfully applied to many problems, forexample in modeling sequential data using Markov chains, and modeling control problems using the Markov decision processes (MDP) formalism. If a first-order Markov model's parameters are estimated from data, the standard maximum likelihood estimator considers only the first-order (single-step) transitions. But for many problems, the firstorder conditionalindependence assumptions are not satisfied, and as a result the higher order transition probabilities may be poorly approximated. Motivated by the problem of learning an MDP's parameters for control, we propose an algorithm for learning a first-order Markov model that explicitly takesinto account higher order interactions during training. Our algorithm uses an optimization criterion different from maximum likelihood, andallows us to learn models that capture longer range effects, but without giving up the benefits of using first-order Markov models. Our experimental results also show the new algorithm outperforming conventional maximumlikelihood estimation in a number of control problems where the MDP's parameters are estimated from data.


Online Bounds for Bayesian Algorithms

Neural Information Processing Systems

We present a competitive analysis of Bayesian learning algorithms in the online learning setting and show that many simple Bayesian algorithms (such as Gaussian linear regression and Bayesian logistic regression) perform favorablywhen compared, in retrospect, to the single best model in the model class. The analysis does not assume that the Bayesian algorithms' modelingassumptions are "correct," and our bounds hold even if the data is adversarially chosen. For Gaussian linear regression (using logloss),our error bounds are comparable to the best bounds in the online learning literature, and we also provide a lower bound showing that Gaussian linear regression is optimal in a certain worst case sense. We also give bounds for some widely used maximum a posteriori (MAP) estimation algorithms, including regularized logistic regression.


Efficient Kernel Machines Using the Improved Fast Gauss Transform

Neural Information Processing Systems

Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N). We also give an error bound for the approximation, and provide experimental results on the UCI datasets.


Distributed Information Regularization on Graphs

Neural Information Processing Systems

We provide a principle for semi-supervised learning based on optimizing the rate of communicating labels for unlabeled points with side information. Theside information is expressed in terms of identities of sets of points or regions with the purpose of biasing the labels in each region to be the same. The resulting regularization objective is convex, has a unique solution, and the solution can be found with a pair of local propagation operationson graphs induced by the regions. We analyze the properties of the algorithm and demonstrate its performance on document classificationtasks.



PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data

Neural Information Processing Systems

We propose a "soft greedy" learning algorithm for building small conjunctions of simple threshold functions, called rays, defined on single real-valued attributes. We also propose a PAC-Bayes risk bound which is minimized for classifiers achieving a nontrivial tradeoff between sparsity (the number of rays used) and the magnitude ofthe separating margin of each ray. Finally, we test the soft greedy algorithm on four DNA micro-array data sets.


Integrating Topics and Syntax

Neural Information Processing Systems

Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyondstatistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short-and long-range dependencies respectively.