Genre
Hybrid Influence Diagrams Using Mixtures of Truncated Exponentials
Cobb, Barry, Shenoy, Prakash P.
Mixtures of truncated exponentials (MTE) potentials are an alternative to discretization for representing continuous chance variables in influence diagrams. Also, MTE potentials can be used to approximate utility functions. This paper introduces MTE influence diagrams, which can represent decision problems without restrictions on the relationships between continuous and discrete chance variables, without limitations on the distributions of continuous chance variables, and without limitations on the nature of the utility functions. In MTE influence diagrams, all probability distributions and the joint utility function (or its multiplicative factors) are represented by MTE potentials and decision nodes are assumed to have discrete state spaces. MTE influence diagrams are solved by variable elimination using a fusion algorithm.
Learning Diagnostic Policies from Examples by Systematic Search
A diagnostic policy specifies what test to perform next, based on the results of previous tests, and when to stop and make a diagnosis. Cost-sensitive diagnostic policies perform tradeoffs between (a) the cost of tests and (b) the cost of misdiagnoses. An optimal diagnostic policy minimizes the expected total cost. We formalize this diagnosis process as a Markov Decision Process (MDP). We investigate two types of algorithms for solving this MDP: systematic search based on AO* algorithm and greedy search (particularly the Value of Information method). We investigate the issue of learning the MDP probabilities from examples, but only as they are relevant to the search for good policies. We do not learn nor assume a Bayesian network for the diagnosis process. Regularizers are developed to control overfitting and speed up the search. This research is the first that integrates overfitting prevention into systematic search. The paper has two contributions: it discusses the factors that make systematic search feasible for diagnosis, and it shows experimentally, on benchmark data sets, that systematic search methods produce better diagnostic policies than greedy methods.
Belief Propagation for Min-cost Network Flow: Convergence and Correctness
Gamarnik, David, Shah, Devavrat, Wei, Yehua
Message passing type algorithms such as the so-called Belief Propagation algorithm have recently gained a lot of attention in the statistics, signal processing and machine learning communities as attractive algorithms for solving a variety of optimization and inference problems. As a decentralized, easy to implement and empirically successful algorithm, BP deserves attention from the theoretical standpoint, and here not much is known at the present stage. In order to fill this gap we consider the performance of the BP algorithm in the context of the capacitated minimum-cost network flow problem - the classical problem in the operations research field. We prove that BP converges to the optimal solution in the pseudo-polynomial time, provided that the optimal solution of the underlying problem is unique and the problem input is integral. Moreover, we present a simple modification of the BP algorithm which gives a fully polynomial-time randomized approximation scheme (FPRAS) for the same problem, which no longer requires the uniqueness of the optimal solution. This is the first instance where BP is proved to have fully-polynomial running time. Our results thus provide a theoretical justification for the viability of BP as an attractive method to solve an important class of optimization problems.
Iterative Conditional Fitting for Gaussian Ancestral Graph Models
Drton, Mathias, Richardson, Thomas S.
Ancestral graph models, introduced by Richardson and Spirtes (2002), generalize both Markov random fields and Bayesian networks to a class of graphs with a global Markov property that is closed under conditioning and marginalization. By design, ancestral graphs encode precisely the conditional independence structures that can arise from Bayesian networks with selection and unobserved (hidden/latent) variables. Thus, ancestral graph models provide a potentially very useful framework for exploratory model selection when unobserved variables might be involved in the data-generating process but no particular hidden structure can be specified. In this paper, we present the Iterative Conditional Fitting (ICF) algorithm for maximum likelihood estimation in Gaussian ancestral graph models. The name reflects that in each step of the procedure a conditional distribution is estimated, subject to constraints, while a marginal distribution is held fixed. This approach is in duality to the well-known Iterative Proportional Fitting algorithm, in which marginal distributions are fitted while conditional distributions are held fixed.
An Integrated, Conditional Model of Information Extraction and Coreference with Applications to Citation Matching
Wellner, Ben, McCallum, Andrew, Peng, Fuchun, Hay, Michael
Although information extraction and coreference resolution appear together in many applications, most current systems perform them as independent steps. This paper describes an approach to integrated inference for extraction and coreference based on conditionally-trained undirected graphical models. We discuss the advantages of conditional probability training, and of a coreference model structure based on graph partitioning. On a data set of research paper citations, we show significant reduction in error by using extraction uncertainty to improve coreference citation matching accuracy, and using coreference to improve the accuracy of the extracted fields.
Active Model Selection
Madani, Omid, Lizotte, Daniel J., Greiner, Russell
Classical learning assumes the learner is given a labeled data sample, from which it learns a model. The field of Active Learning deals with the situation where the learner begins not with a training sample, but instead with resources that it can use to obtain information to help identify the optimal model. To better understand this task, this paper presents and analyses the simplified "(budgeted) active model selection" version, which captures the pure exploration aspect of many active learning problems in a clean and simple problem formulation. Here the learner can use a fixed budget of "model probes" (where each probe evaluates the specified model on a random indistinguishable instance) to identify which of a given set of possible models has the highest expected accuracy. Our goal is a policy that sequentially determines which model to probe next, based on the information observed so far. We present a formal description of this task, and show that it is NPhard in general. We then investigate a number of algorithms for this task, including several existing ones (eg, "Round-Robin", "Interval Estimation", "Gittins") as well as some novel ones (e.g., "Biased-Robin"), describing first their approximation properties and then their empirical performance on various problem instances. We observe empirically that the simple biased-robin algorithm significantly outperforms the other algorithms in the case of identical costs and priors.
Variational Chernoff Bounds for Graphical Models
Ravikumar, Pradeep, Lafferty, John
Recent research has made significant progress on the problem of bounding log partition functions for exponential family graphical models. Such bounds have associated dual parameters that are often used as heuristic estimates of the marginal probabilities required in inference and learning. However these variational estimates do not give rigorous bounds on marginal probabilities, nor do they give estimates for probabilities of more general events than simple marginals. In this paper we build on this recent work by deriving rigorous upper and lower bounds on event probabilities for graphical models. Our approach is based on the use of generalized Chernoff bounds to express bounds on event probabilities in terms of convex optimization problems; these optimization problems, in turn, require estimates of generalized log partition functions. Simulations indicate that this technique can result in useful, rigorous bounds to complement the heuristic variational estimates, with comparable computational cost.
The Author-Topic Model for Authors and Documents
Rosen-Zvi, Michal, Griffiths, Thomas, Steyvers, Mark, Smyth, Padhraic
We intro duce the author-topic mo del, a generative mo del for do cuments that extends Latent Dirichlet Allo cation (LDA; Blei, Ng, & Jordan, 2003) to include authorship information. Each author is asso ciated with a multinomial distribution over topics and each topic is asso ciated with a multinomial distribution over words. A do cument with multiple authors is mo deled as a distribution over topics that is a mixture of the distributions asso ci-ated with the authors. We apply the mo del to a collection of 1,700 NIPS conference pap ers and 160,000 CiteSeer abstracts. Exact inference is intractable for these datasets and we use Gibbs sampling to estimate the topic and author distributions. We compare the p erformance with two other generative mo d-els for do cuments, which are sp ecial cases of the author-topic mo del: LDA (a topic mo del) and a simple author mo del in which each author is asso ciated with a distribution over words rather than a distribution over topics. We show topics recovered by the author-topic mo del, and demonstrate applications to computing similarity b etween authors and entropy of author output.
Factored Latent Analysis for far-field tracking data
This paper uses Factored Latent Analysis (FLA) to learn a factorized, segmental representation for observations of tracked objects over time. Factored Latent Analysis is latent class analysis in which the observation space is subdivided and each aspect of the original space is represented by a separate latent class model. One could simply treat these factors as completely independent and ignore their interdependencies or one could concatenate them together and attempt to learn latent class structure for the complete observation space. Alternatively, FLA allows the interdependencies to be exploited in estimating an effective model, which is also capable of representing a factored latent state. In this paper, FLA is used to learn a set of factored latent classes to represent different modalities of observations of tracked objects. Different characteristics of the state of tracked objects are each represented by separate latent class models, including normalized size, normalized speed, normalized direction, and position. This model also enables effective temporal segmentation of these sequences. This method is data-driven, unsupervised using only pairwise observation statistics. This data-driven and unsupervised activity classi- fication technique exhibits good performance in multiple challenging environments.
Graph partition strategies for generalized mean field inference
Xing, Eric P., Jordan, Michael I., Russell, Stuart
An autonomous variational inference algorithm for arbitrary graphical models requires the ability to optimize variational approximations over the space of model parameters as well as over the choice of tractable families used for the variational approximation. In this paper, we present a novel combination of graph partitioning algorithms with a generalized mean field (GMF) inference algorithm. This combination optimizes over disjoint clustering of variables and performs inference using those clusters. We provide a formal analysis of the relationship between the graph cut and the GMF approximation, and explore several graph partition strategies empirically. Our empirical results provide rather clear support for a weighted version of MinCut as a useful clustering algorithm for GMF inference, which is consistent with the implications from the formal analysis.