Goto

Collaborating Authors

 Industry


Efficient sparse coding algorithms

Neural Information Processing Systems

Sparse coding provides a class of algorithms for finding succinct representations of stimuli; given only unlabeled input data, it discovers basis functions that capture higher-levelfeatures in the data. However, finding sparse codes remains a very difficult computational problem.


No-regret Algorithms for Online Convex Programs

Neural Information Processing Systems

Online convex programming has recently emerged as a powerful primitive for designing machine learning algorithms. For example, OCP can be used for learning alinear classifier, dynamically rebalancing a binary search tree, finding the shortest path in a graph with unknown edge lengths, solving a structured classification problem,or finding a good strategy in an extensive-form game. Several researchers have designed no-regret algorithms for OCP. But, compared to algorithms forspecial cases of OCP such as learning from expert advice, these algorithms are not very numerous or flexible. In learning from expert advice, one tool which has proved particularly valuable is the correspondence between no-regret algorithms and convex potential functions: by reasoning about these potential functions, researchers have designed algorithms with a wide variety of useful guarantees such as good performance when the target hypothesis is sparse. Until now, there has been no such recipe for the more general OCP problem, and therefore no ability to tune OCP algorithms to take advantage of properties of the problem or data. In this paper we derive a new class of no-regret learning algorithms forOCP. These Lagrangian Hedging algorithms are based on a general class of potential functions, and are a direct generalization of known learning rules like weighted majority and external-regret matching. In addition to proving regret bounds, we demonstrate our algorithms learning to play one-card poker.


Analysis of Empirical Bayesian Methods for Neuroelectromagnetic Source Localization

Neural Information Processing Systems

The ill-posed nature of the MEG/EEG source localization problem requires the incorporation of prior assumptions when choosing an appropriate solution out of an infinite set of candidates. Bayesian methods are useful in this capacity because they allow these assumptions to be explicitly quantified. Recently, a number of empirical Bayesian approaches have been proposed that attempt a form of model selection by using the data to guide the search for an appropriate prior. While seemingly quite different in many respects, we apply a unifying framework based on automatic relevance determination (ARD) that elucidates various attributes of these methods and suggests directions for improvement. We also derive theoretical propertiesof this methodology related to convergence, local minima, and localization bias and explore connections with established algorithms.



Multiple timescales and uncertainty in motor adaptation

Neural Information Processing Systems

For example, muscleresponse can change because of fatigue, a condition where the disturbance has a fast timescale or because of disease where the disturbance is much slower. Here we hypothesize that the nervous system adapts in a way that reflects the temporal properties of such potential disturbances. According to a Bayesian formulation of this idea, movement error results in a credit assignment problem:what timescale is responsible for this disturbance? The adaptation schedule influences the behavior of the optimal learner, changing estimates at different timescalesas well as the uncertainty. A system that adapts in this way predicts many properties observed in saccadic gain adaptation. It well predicts the timecourses of motor adaptation in cases of partial sensory deprivation and reversals of the adaptation direction.



Inferring Network Structure from Co-Occurrences

Neural Information Processing Systems

We consider the problem of inferring the structure of a network from cooccurrence data:observations that indicate which nodes occur in a signaling pathway but do not directly reveal node order within the pathway. This problem is motivated by network inference problems arising in computational biology and communication systems, in which it is difficult or impossible to obtain precise time ordering information. Without order information, every permutation of the activated nodes leads to a different feasible solution, resulting in combinatorial explosion of the feasible set. However, physical principles underlying most networked systemssuggest that not all feasible solutions are equally likely. Intuitively, nodes that cooccur more frequently are probably more closely connected. Building on this intuition, we model path co-occurrences as randomly shuffled samples of a random walk on the network. We derive a computationally efficient network inference algorithm and, via novel concentration inequalities for importance samplingestimators, prove that a polynomial complexity Monte Carlo version of the algorithm converges with high probability.


Adaptive Spatial Filters with predefined Region of Interest for EEG based Brain-Computer-Interfaces

Neural Information Processing Systems

The performance of EEGbased Brain-Computer-Interfaces (BCIs) critically depends onthe extraction of features from the EEG carrying information relevant for the classification of different mental states. For BCIs employing imaginary movements of different limbs, the method of Common Spatial Patterns (CSP) has been shown to achieve excellent classification results.


Greedy Layer-Wise Training of Deep Networks

Neural Information Processing Systems

Complexity theory of circuits strongly suggests that deep architectures can be much more efficient (sometimes exponentially) than shallow architectures, in terms of computational elements required to represent some functions. Deep multi-layer neural networks have many levels of non-linearities allowing them to compactly represent highly nonlinear and highly-varying functions. However, until recently it was not clear how to train such deep networks, since gradient-based optimization starting from random initialization appears to often get stuck in poor solutions.