The Author-Topic Model for Authors and Documents Machine Learning

We introduce the author-topic model, a generative model for documents that extends Latent Dirichlet Allocation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is associated with a multinomial distribution over topics and each topic is associated with a multinomial distribution over words. A document with multiple authors is modeled as a distribution over topics that is a mixture of the distributions associated with the authors. We apply the model to a collection of 1,700 NIPS conference papers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the performance with two other generative models for documents, which are special cases of the author-topic model: LDA (a topic model) and a simple author model in which each author is associated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic model, and demonstrate applications to computing similarity between authors and entropy of author output.

Execution Monitoring with Quantitative Temporal Bayesian Networks

AAAI Conferences

The goal of execution monitoring is to determine whether a system or person is following a plan appropriately. Monitoring information may be uncertain, and the plan being monitored may have complex temporal constraints. We develop a new framework for reasoning under uncertainty with quantitative temporal constraints - Quantitative Temporal Bayesian Networks - and we discuss its application to plan-execution monitoring. QTBNs extend the major previous approaches to temporal reasoning under uncertainty: Time Nets (Kanazawa 1991), Dynamic Bayesian Networks and Dynamic Object Oriented Bayesian Networks (Friedman, Koller, & Pfeffer 1998). We argue that Time Nets can model quantitative temporal relationships but cannot easily model the changing values of fluents, while DBNs and DOOBNs naturally model fluents, but not quantitative temporal relationships. Both capabilities are required for execution monitoring, and are supported by QTBNs.

Bayesian Self-Organization

Neural Information Processing Systems

Smirnakis Lyman Laboratory of Physics Harvard University Cambridge, MA 02138 Lei Xu * Dept. of Computer Science HSH ENG BLDG, Room 1006 The Chinese University of Hong Kong Shatin, NT Hong Kong Abstract Recent work by Becker and Hinton (Becker and Hinton, 1992) shows a promising mechanism, based on maximizing mutual information assumingspatial coherence, by which a system can selforganize itself to learn visual abilities such as binocular stereo. We introduce a more general criterion, based on Bayesian probability theory, and thereby demonstrate a connection to Bayesian theories ofvisual perception and to other organization principles for early vision (Atick and Redlich, 1990). Methods for implementation usingvariants of stochastic learning are described and, for the special case of linear filtering, we derive an analytic expression for the output. 1 Introduction The input intensity patterns received by the human visual system are typically complicated functions of the object surfaces and light sources in the world. It *Lei Xu was a research scholar in the Division of Applied Sciences at Harvard University while this work was performed. Thus the visual system must be able to extract information from the input intensities that is relatively independent of the actual intensity values.

Model Selection Through Sparse Maximum Likelihood Estimation Artificial Intelligence

We consider the problem of estimating the parameters of a Gaussian or binary distribution in such a way that the resulting undirected graphical model is sparse. Our approach is to solve a maximum likelihood problem with an added l_1-norm penalty term. The problem as formulated is convex but the memory requirements and complexity of existing interior point methods are prohibitive for problems with more than tens of nodes. We present two new algorithms for solving problems with at least a thousand nodes in the Gaussian case. Our first algorithm uses block coordinate descent, and can be interpreted as recursive l_1-norm penalized regression. Our second algorithm, based on Nesterov's first order method, yields a complexity estimate with a better dependence on problem size than existing interior point methods. Using a log determinant relaxation of the log partition function (Wainwright & Jordan (2006)), we show that these same algorithms can be used to solve an approximate sparse maximum likelihood problem for the binary case. We test our algorithms on synthetic data, as well as on gene expression and senate voting records data.

A Probabilistic Calculus of Actions

AAAI Conferences

In planning, however, they are less popular, 1 partly due to the unsettled, strange relationship between probability and actions. In principle, actions are not part of standard probability theory, and understandably so: probabilities capture normal relationships in the world, while actions represent interventions that perturb those relationships. It is no wonder, then, that actions are treated as foreign entities throughout the literature on probability and statistics; they serve neither as arguments of probability expressions nor as events for conditioning such expressions. Even in the decision theoretic literature, where actions are the target of op-1Works by Dean & Kanazawa [1989] and Kushmerick et al. [1993] notwithstanding.