Bayesian Inference
MAP Complexity Results and Approximation Methods
MAP is the problem of finding a most probable instantiation of a set of nvariables in a Bayesian network, given some evidence. MAP appears to be a significantly harder problem than the related problems of computing the probability of evidence Pr, or MPE a special case of MAP. Because of the complexity of MAP, and the lack of viable algorithms to approximate it,MAP computations are generally avoided by practitioners. This paper investigates the complexity of MAP. We show that MAP is complete for NP. We also provide negative complexity results for elimination based algorithms. It turns out that MAP remains hard even when MPE, and Pr are easy. We show that MAP is NPcomplete when the networks are restricted to polytrees, and even then can not be effectively approximated. Because there is no approximation algorithm with guaranteed results, we investigate best effort approximations. We introduce a generic MAP approximation framework. As one instantiation of it, we implement local search coupled with belief propagation BP to approximate MAP. We show how to extract approximate evidence retraction information from belief propagation which allows us to perform efficient local search. This allows MAP approximation even on networks that are too complex to even exactly solve the easier problems of computing Pr or MPE. Experimental results indicate that using BP and local search provides accurate MAP estimates in many cases.
Finding Optimal Bayesian Networks
Chickering, David Maxwell, Meek, Christopher
In this paper, we derive optimality results for greedy Bayesian-network search algorithms that perform single-edge modifications at each step and use asymptotically consistent scoring criteria. Our results extend those of Meek (1997) and Chickering (2002), who demonstrate that in the limit of large datasets, if the generative distribution is perfect with respect to a DAG defined over the observable variables, such search algorithms will identify this optimal (i.e. We relax their assumption about the generative distribution, and assume only that this distribution satisfies the composition property over the observable variables, which is a more realistic assumption for real domains. Under this assumption, we guarantee that the search algorithms identify an inclusion-optimal model; that is, a model that (1) contains the generative distribution and (2) has no sub-model that contains this distribution. In addition, we show that the composition property is guaranteed to hold whenever the dependence relationships in the generative distribution can be characterized by paths between singleton elements in some generative graphical model (e.g. a DAG, a chain graph, or a Markov network) even when the generative model includes unobserved variables, and even when the observed data is subject to selection bias. Introduction The problem of learning Bayesian networks (a.k.a directed graphical models) from data has received much attention in the UAI community. A simple approach taken by many researchers, particularly those contributing experimental papers, is to apply--in conjunction with a scoring criterion--a greedy single-edge search algorithm to the space of Bayesian-network structures or to the space of equivalence classes of those structures. There are a number of important reasons for the popularity of this approach.
Continuation Methods for Mixing Heterogenous Sources
Corduneanu, Adrian, Jaakkola, Tommi S.
A number of modern learning tasks involve estimation from heterogeneous information sources. This includes classification with labeled and unlabeled data as well as other problems with analogous structure such as competitive (game theoretic) problems. The associated estimation problems can be typically reduced to solving a set of fixed point equations (consistency conditions). We introduce a general method for combining a preferred information source with another in this setting by evolving continuous paths of fixed points at intermediate allocations. We explicitly identify critical points along the unique paths to either increase the stability of estimation or to ensure a significant departure from the initial source. The homotopy continuation approach is guaranteed to terminate at the second source, and involves no combinatorial effort. We illustrate the power of these ideas both in classification tasks with labeled and unlabeled data, as well as in the context of a competitive (min-max) formulation of DNA sequence motif discovery.
Coordinates: Probabilistic Forecasting of Presence and Availability
Horvitz, Eric J., Koch, Paul, Kadie, Carl, Jacobs, Andy
We present methods employed in COORDINATE, a prototype service that supports collaboration and communication by learning predictive models that provide forecasts of users' presence and availability. We describe how data is collected about user activity and proximity from multiple devices, in addition to analysis of the content of users' calendars, the time of day, and day of week. We review applications of presence forecasting embedded in the PRIORITIES application and then present details of the COORDINATE service that was informed by the earlier efforts.
Bayesian one-mode projection for dynamic bipartite graphs
Psorakis, Ioannis, Rezek, Iead, Frankel, Zach, Roberts, Stephen J.
We propose a Bayesian methodology for one-mode projecting a bipartite network that is being observed across a series of discrete time steps. The resulting one mode network captures the uncertainty over the presence/absence of each link and provides a probability distribution over its possible weight values. Additionally, the incorporation of prior knowledge over previous states makes the resulting network less sensitive to noise and missing observations that usually take place during the data collection process. The methodology consists of computationally inexpensive update rules and is scalable to large problems, via an appropriate distributed implementation.
Unsupervised Active Learning in Large Domains
Steck, Harald, Jaakkola, Tommi S.
Active learning is a powerful approach to analyzing data effectively. We show that the feasibility of active learning depends crucially on the choice of measure with respect to which the query is being optimized. The standard information gain, for example, does not permit an accurate evaluation with a small committee, a representative subset of the model space. We propose a surrogate measure requiring only a small committee and discuss the properties of this new measure. We devise, in addition, a bootstrap approach for committee selection. The advantages of this approach are illustrated in the context of recovering (regulatory) network models.
Dimension Correction for Hierarchical Latent Class Models
Kocka, Tomas, Zhang, Nevin Lianwen
Model complexity is an important factor to consider when selecting among graphical models. When all variables are observed, the complexity of a model can be measured by its standard dimension, i.e. the number of independent parameters. When hidden variables are present, however, standard dimension might no longer be appropriate. One should instead use effective dimension (Geiger et al. 1996). This paper is concerned with the computation of effective dimension. First we present an upper bound on the effective dimension of a latent class (LC) model. This bound is tight and its computation is easy. We then consider a generalization of LC models called hierarchical latent class (HLC) models (Zhang 2002). We show that the effective dimension of an HLC model can be obtained from the effective dimensions of some related LC models. We also demonstrate empirically that using effective dimension in place of standard dimension improves the quality of models learned from data.
Learning with Scope, with Application to Information Extraction and Classification
Blei, David, Bagnell, J Andrew, McCallum, Andrew
In probabilistic approaches to classification and information extraction, one typically builds a statistical model of words under the assumption that future data will exhibit the same regularities as the training data. In many data sets, however, there are scope-limited features whose predictive power is only applicable to a certain subset of the data. For example, in information extraction from web pages, word formatting may be indicative of extraction category in different ways on different web pages. The difficulty with using such features is capturing and exploiting the new regularities encountered in previously unseen data. In this paper, we propose a hierarchical probabilistic model that uses both local/scope-limited features, such as word formatting, and global features, such as word content. The local regularities are modeled as an unobserved random parameter which is drawn once for each local data set. This random parameter is estimated during the inference process and then used to perform classification with both the local and global features--- a procedure which is akin to automatically retuning the classifier to the local regularities on each newly encountered web page. Exact inference is intractable and we present approximations via point estimates and variational methods. Empirical results on large collections of web data demonstrate that this method significantly improves performance from traditional models of global features alone.
Learning Hierarchical Object Maps Of Non-Stationary Environments with mobile robots
Anguelov, Dragomir, Biswas, Rahul, Koller, Daphne, Limketkai, Benson, Thrun, Sebastian
Building models, or maps, of robot environments is a highly active research area; however, most existing techniques construct unstructured maps and assume static environments. In this paper, we present an algorithm for learning object models of non-stationary objects found in office-type environments. Our algorithm exploits the fact that many objects found in office environments look alike (e.g., chairs, recycling bins). It does so through a two-level hierarchical representation, which links individual objects with generic shape templates of object classes. We derive an approximate EM algorithm for learning shape parameters at both levels of the hierarchy, using local occupancy grid maps for representing shape. Additionally, we develop a Bayesian model selection algorithm that enables the robot to estimate the total number of objects and object templates in the environment. Experimental results using a real robot equipped with a laser range finder indicate that our approach performs well at learning object-based maps of simple office environments. The approach outperforms a previously developed non-hierarchical algorithm that models objects but lacks class templates.
IPF for Discrete Chain Factor Graphs
Iterative Proportional Fitting (IPF), combined with EM, is commonly used as an algorithm for likelihood maximization in undirected graphical models. In this paper, we present two iterative algorithms that generalize upon IPF. The first one is for likelihood maximization in discrete chain factor graphs, which we define as a wide class of discrete variable models including undirected graphical models and Bayesian networks, but also chain graphs and sigmoid belief networks. The second one is for conditional likelihood maximization in standard undirected models and Bayesian networks. In both algorithms, the iteration steps are expressed in closed form. Numerical simulations show that the algorithms are competitive with state of the art methods.