Goto

Collaborating Authors

 Learning Graphical Models


Divisive Normalization, Line Attractor Networks and Ideal Observers

Neural Information Processing Systems

We explore in this study the statistical properties of this normalization in the presence of noise. Using simulations, we show that divisive normalization is a close approximation to a maximum likelihood estimator, which, in the context of population coding, is the same as an ideal observer. We also demonstrate analytically that this is a general property of a large class of nonlinear recurrent networks with line attractors. Our work suggests that divisive normalization plays a critical role in noise filtering, and that every cortical layer may be an ideal observer of the activity in the preceding layer. Information processing in the cortex is often formalized as a sequence of a linear stages followed by a nonlinearity.


Bayesian Modeling of Human Concept Learning

Neural Information Processing Systems

I consider the problem of learning concepts from small numbers of positive examples, a feat which humans perform routinely but which computers are rarely capable of. Bridging machine learning and cognitive science perspectives, I present both theoretical analysis and an empirical study with human subjects for the simple task oflearning concepts corresponding to axis-aligned rectangles in a multidimensional feature space. Existing learning models, when applied to this task, cannot explain how subjects generalize from only a few examples of the concept. I propose a principled Bayesian model based on the assumption that the examples are a random sample from the concept to be learned. The model gives precise fits to human behavior on this simple task and provides qualitati ve insights into more complex, realistic cases of concept learning.


Multiple Paired Forward-Inverse Models for Human Motor Learning and Control

Neural Information Processing Systems

Humans demonstrate a remarkable ability to generate accurate and appropriate motor behavior under many different and oftpn uncprtain environmental conditions. This paper describes a new modular approach to human motor learning and control, baspd on multiple pairs of inverse (controller) and forward (prpdictor) models. This architecture simultaneously learns the multiple inverse models necessary for control as well as how to select the inverse models appropriate for a given em'ironm0nt. Simulations of object manipulation demonstrates the ability to learn mUltiple objects, appropriate generalization to novel objects and the inappropriate activation of motor programs based on visual cues, followed by online correction, seen in the "size-weight illusion".


An Entropic Estimator for Structure Discovery

Neural Information Processing Systems

We introduce a novel framework for simultaneous structure and parameter learning in hidden-variable conditional probability models, based on an entropic prior and a solution for its maximum a posteriori (MAP) estimator. The MAP estimate minimizes uncertainty in all respects: cross-entropy between model and data; entropy of the model; entropy of the data's descriptive statistics. Iterative estimation extinguishes weakly supported parameters, compressing and sparsifying the model. Trimming operators accelerate this process by removing excess parameters and, unlike most pruning schemes, guarantee an increase in posterior probability. Entropic estimation takes a overcomplete random model and simplifies it, inducing the structure of relations between hidden and observed variables. Applied to hidden Markov models (HMMs), it finds a concise finite-state machine representing the hidden structure of a signal. We entropically model music, handwriting, and video time-series, and show that the resulting models are highly concise, structured, predictive, and interpretable: Surviving states tend to be highly correlated with meaningful partitions of the data, while surviving transitions provide a low-perplexity model of the signal dynamics.


Maximum Conditional Likelihood via Bound Maximization and the CEM Algorithm

Neural Information Processing Systems

Advantages in feature selection, robustness andlimited resource allocation have been studied. Ultimately, tasks such as regression and classification reduce to the evaluation of a conditional density. However, popularity of maximumjoint likelihood and EM techniques remains strong in part due to their elegance and convergence properties. Thus, many conditional problems are solved by first estimating joint models then conditioning them.


Tractable Variational Structures for Approximating Graphical Models

Neural Information Processing Systems

Graphical models provide a broad probabilistic framework with applications inspeech recognition (Hidden Markov Models), medical diagnosis (Belief networks) and artificial intelligence (Boltzmann Machines). However, the computing time is typically exponential in the number of nodes in the graph. Within the variational framework forapproximating these models, we present two classes of distributions, decimatableBoltzmann Machines and Tractable Belief Networks that go beyond the standard factorized approach. We give generalised mean-field equations for both these directed and undirected approximations. Simulation results on a small benchmark problemsuggest using these richer approximations compares favorably against others previously reported in the literature. 1 Introduction Graphical models provide a powerful framework for probabilistic inference[l] but suffer intractability when applied to large scale problems.


Risk Sensitive Reinforcement Learning

Neural Information Processing Systems

A directed generative model for binary data using a small number of hidden continuous units is investigated. The relationships between the correlations of the underlying continuous Gaussian variables and the binary output variables are utilized to learn the appropriate weights of the network. The advantages of this approach are illustrated on a translationally invariant binary distribution and on handwritten digit images. Introduction Principal Components Analysis (PCA) is a widely used statistical technique for representing data with a large number of variables [1]. It is based upon the assumption that although the data is embedded in a high dimensional vector space, most of the variability in the data is captured by a much lower climensional manifold. In particular for PCA, this manifold is described by a linear hyperplane whose characteristic directions are given by the eigenvectors of the correlation matrix with the largest eigenvalues. The success of PCA and closely related techniques such as Factor Analysis (FA) and PCA mixtures clearly indicate that much real world data exhibit the low dimensional manifold structure assumed by these models [2, 3].


DTs: Dynamic Trees

Neural Information Processing Systems

A dynamic tree model specifies a prior over a large number of trees, each one of which is a tree-structured belief net (TSBN) . Our aim is to retain the advantages of tree-structured belief networks, namely the hierarchical structure of the model and (in part) the efficient inference algorithms, while avoiding the "blocky" artifacts that derive from a single, fixed TSBN structure. One use for DTs is as prior models over labellings for image segmentation problems.


Inference in Multilayer Networks via Large Deviation Bounds

Neural Information Processing Systems

Arguably oneabilities of the most important types of information processing is the capacity for probabilistic reasoning. The properties of undirectedproDabilistic models represented as symmetric networks ethave been studied extensively using methods from statistical mechanics (Hertz aI, 1991). Detailed analyses of these models are possible by exploiting averaging that occur in the thermodynamic limit of large networks.phenomena In this paper, we analyze the limit of large, multilayer networks for probabilistic models represented as directed acyclic graphs. These models are known as Bayesian networks (Pearl, 1988; Neal, 1992), and they have different probabilistic semantics than symmetric neural networks (such as Hopfield models or Boltzmann machines). We show that the intractability of exact inference in multilayer Bayesian networks 261 Inference in Multilayer Networks via Large Deviation Bounds does not preclude their effective use. Our work builds on earlier studies of variational methods (Jordan et aI, 1997).


A Reinforcement Learning Algorithm in Partially Observable Environments Using Short-Term Memory

Neural Information Processing Systems

Since BLHT learns a stochastic model based on Bayesian Learning, the overfitting problemis reasonably solved. Moreover, BLHT has an efficient implementation. This paper shows that the model learned by BLHT converges toone which provides the most accurate predictions of percepts and rewards, given short-term memory. 1 INTRODUCTION Research on Reinforcement Learning (RL) problem forpartially observable environments is gaining more attention recently. This is mainly because the assumption that perfect and complete perception of the state of the environment is available for the learning agent, which many previous RL algorithms require, is not valid for many realistic environments.