Plotting

 Country


Laplace Propagation

Neural Information Processing Systems

We present a novel method for approximate inference in Bayesian models andregularized risk functionals. It is based on the propagation of mean and variance derived from the Laplace approximation of conditional probabilitiesin factorizing distributions, much akin to Minka's Expectation Propagation. In the jointly normal case, it coincides with the latter and belief propagation, whereas in the general case, it provides an optimization strategy containing Support Vector chunking, the Bayes Committee Machine, and Gaussian Process chunking as special cases.



Approximate Expectation Maximization

Neural Information Processing Systems

The E-step boils down to computing probabilities of the hidden variables given the observed variables (evidence) and current set of parameters. The M-step then, given these probabilities, yields a new set of parameters guaranteed to increase the likelihood. In Bayesian networks, that will be the focus of this article, the M-step is usually relatively straightforward. A complication may arise in the E-step, when computing the probability of the hidden variables given the evidence becomes intractable. An often used approach is to replace the exact yet intractable inference in the E step with approximate inference, either through sampling or using a deterministic variational method. The use of a "mean-field" variational method in this context leads to an algorithm known as variational EM and can be given theinterpretation of minimizing a free energy with respect to both a tractable approximate distribution (approximate E-step) and the parameters (M-step) [2]. Loopy belief propagation [3] and variants thereof, such as generalized belief propagation [4]and expectation propagation [5], have become popular alternatives to the "mean-field" variational approaches, often yielding somewhat better approximations. Andindeed, they can and have been applied for approximate inference in the E-step of the EM algorithm (see e.g.


Dopamine Modulation in a Basal Ganglio-Cortical Network of Working Memory

Neural Information Processing Systems

Dopamine exerts two classes of effect on the sustained neural activity in prefrontal cortex that underlies working memory. Direct release in the cortex increases the contrast of prefrontal neurons, enhancing the robustness ofstorage. Release of dopamine in the striatum is associated with salient stimuli and makes medium spiny neurons bistable; this modulation ofthe output of spiny neurons affects prefrontal cortex so as to indirectly gate access to working memory and additionally damp sensitivity tonoise. Existing models have treated dopamine in one or other structure, or have addressed basal ganglia gating of working memory exclusive ofdopamine effects. In this paper we combine these mechanisms and explore their joint effect. We model a memory-guided saccade task to illustrate how dopamine's actions lead to working memory that is selective forsalient input and has increased robustness to distraction.


Robustness in Markov Decision Problems with Uncertain Transition Matrices

Neural Information Processing Systems

Optimal solutions to Markov Decision Problems (MDPs) are very sensitive withrespect to the state transition probabilities. In many practical problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to realworld problems.We propose an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities.


On the Concentration of Expectation and Approximate Inference in Layered Networks

Neural Information Processing Systems

We present an analysis of concentration-of-expectation phenomena in layered Bayesian networks that use generalized linear models as the local conditional probabilities. This framework encompasses a wide variety of probability distributions, including both discrete and continuous random variables. We utilize ideas from large deviation analysis and the delta method to devise and evaluate a class of approximate inference algorithms forlayered Bayesian networks that have superior asymptotic error bounds and very fast computation time.


Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

Neural Information Processing Systems

The problem of "Structure From Motion" is a central problem in vision: given the 2D locations of certain points we wish to recover the camera motion and the 3D coordinates of the points. Under simplifiedcamera models, the problem reduces to factorizing a measurement matrix into the product of two low rank matrices. Each element of the measurement matrix contains the position of a point in a particular image. When all elements are observed, the problem can be solved trivially using SVD, but in any realistic situation manyelements of the matrix are missing and the ones that are observed have a different directional uncertainty. Under these conditions, most existing factorization algorithms fail while human perception is relatively unchanged. In this paper we use the well known EM algorithm for factor analysis toperform factorization. This allows us to easily handle missing data and measurement uncertainty and more importantly allows us to place a prior on the temporal trajectory of the latent variables (the camera position). We show that incorporating this prior gives a significant improvement in performance in challenging image sequences.


When Does Non-Negative Matrix Factorization Give a Correct Decomposition into Parts?

Neural Information Processing Systems

We interpret nonnegative matrix factorization geometrically, as the problem of finding a simplicial cone which contains a cloud of data points and which is contained in the positive orthant. We show that under certain conditions, basically requiring that some of the data are spread across the faces of the positive orthant, there is a unique such simplicial cone.We give examples of synthetic image articulation databases which obey these conditions; these require separated support and factorial sampling.For such databases there is a generative model in terms of'parts' and NMF correctly identifies the'parts'. We show that our theoretical results are predictive of the performance of published NMF code, by running the published algorithms on one of our synthetic image articulation databases.



Learning the k in k-means

Neural Information Processing Systems

When clustering a dataset, the right number k of clusters to use is often not obvious, and choosing k automatically is a hard algorithmic problem. Inthis paper we present an improved algorithm for learning k while clustering. The G-means algorithm is based on a statistical test for the hypothesis that a subset of data follows a Gaussian distribution. G-means runs k-means with increasing k in a hierarchical fashion until the test accepts thehypothesis that the data assigned to each k-means center are Gaussian. Two key advantages are that the hypothesis test does not limit the covariance of the data and does not compute a full covariance matrix. Additionally, G-means only requires one intuitive parameter, the standard statisticalsignificance level α. We present results from experiments showing that the algorithm works well, and better than a recent method based on the BIC penalty for model complexity. In these experiments, we show that the BIC is ineffective as a scoring function, since it does not penalize strongly enough the model's complexity.