Goto

Collaborating Authors

 Learning Graphical Models


Mixtures of Shifted Asymmetric Laplace Distributions

arXiv.org Machine Learning

A mixture of shifted asymmetric Laplace distributions is introduced and used for clustering and classification. A variant of the EM algorithm is developed for parameter estimation by exploiting the relationship with the general inverse Gaussian distribution. This approach is mathematically elegant and relatively computationally straightforward. Our novel mixture modelling approach is demonstrated on both simulated and real data to illustrate clustering and classification applications. In these analyses, our mixture of shifted asymmetric Laplace distributions performs favourably when compared to the popular Gaussian approach. This work, which marks an important step in the non-Gaussian model-based clustering and classification direction, concludes with discussion as well as suggestions for future work.


An Experiment with Hierarchical Bayesian Record Linkage

arXiv.org Machine Learning

In record linkage (RL), or exact file matching, the goal is to identify the links between entities with information on two or more files. RL is an important activity in areas including counting the population, enhancing survey frames and data, and conducting epidemiological and follow-up studies. RL is challenging when files are very large, no accurate personal identification (ID) number is present on all files for all units, and some information is recorded with error. Without an unique ID number one must rely on comparisons of names, addresses, dates, and other information to find the links. Latent class models can be used to automatically score the value of information for determining match status. Data for fitting models come from comparisons made within groups of units that pass initial file blocking requirements. Data distributions can vary across blocks. This article examines the use of prior information and hierarchical latent class models in the context of RL.


A Practical Algorithm for Topic Modeling with Provable Guarantees

arXiv.org Machine Learning

Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model inference have been based on a maximum likelihood objective. Efficient algorithms exist that approximate this objective, but they have no provable guarantees. Recently, algorithms have been introduced that provide provable bounds, but these algorithms are not practical because they are inefficient and not robust to violations of model assumptions. In this paper we present an algorithm for topic model inference that is both provable and practical. The algorithm produces results comparable to the best MCMC implementations while running orders of magnitude faster.


Simple Regret Optimization in Online Planning for Markov Decision Processes

arXiv.org Artificial Intelligence

We consider online planning in Markov decision processes (MDPs). In online planning, the agent focuses on its current state only, deliberates about the set of possible policies from that state onwards and, when interrupted, uses the outcome of that exploratory deliberation to choose what action to perform next. The performance of algorithms for online planning is assessed in terms of simple regret, which is the agent's expected performance loss when the chosen action, rather than an optimal one, is followed. To date, state-of-the-art algorithms for online planning in general MDPs are either best effort, or guarantee only polynomial-rate reduction of simple regret over time. Here we introduce a new Monte-Carlo tree search algorithm, BRUE, that guarantees exponential-rate reduction of simple regret and error probability. This algorithm is based on a simple yet non-standard state-space sampling scheme, MCTS2e, in which different parts of each sample are dedicated to different exploratory objectives. Our empirical evaluation shows that BRUE not only provides superior performance guarantees, but is also very effective in practice and favorably compares to state-of-the-art. We then extend BRUE with a variant of "learning by forgetting." The resulting set of algorithms, BRUE(alpha), generalizes BRUE, improves the exponential factor in the upper bound on its reduction rate, and exhibits even more attractive empirical performance.


Probability Bracket Notation: Markov State Chain Projector, Hidden Markov Models and Dynamic Bayesian Networks

arXiv.org Artificial Intelligence

The Weather-Stone Example and the Elvira Software page 21 5. VMM, HMM and FHMM as Dynamic Bayesian Networks page 23 Summary page 26 References page 27 Abstract After a brief discussion of Markov Evolution Formula (MEF) expressed in Probability Bracket Notation (PBN), its close relation with the joint probability distribution (JPD) of Visible Markov Models (VMM) is demonstrated by introducing Markov State Chain Projector (MSCP). The state basis and the observed basis are defined in the Sequential Event Space (SES) of Hidden Markov Models (HMM). The JPD of HMM is derived by using basis transformation in SES. The Viterbi algorithm is revisited and applied to the famous Weather HMM example, whose node graph and inference results are displayed by using software package Elvira. In the end, the formulas of VMM, HMM and some factorial HMM (FHMM) are expressed in PBN as instances of dynamic Bayesian Networks (DBN). Dr. Xing M Wang PBN, Markov Time Evolution & HMM Page 1 of 27 2012-12-16 1. Introduction: PBN and Discrete Markov Chain Inspired by the great success of Dirac notation, we have proposed Probability Bracket Notation (PBN) [1], where we have used PBN to discuss Markov chains (see [2] Chap.11). Based on our main topic of this article, we will concentrate on homogeneous, time-discrete first-order Markov chains with finite discrete states.


Equivalence of History and Generator Epsilon-Machines

arXiv.org Machine Learning

Epsilon-machines are minimal, unifilar presentations of stationary stochastic processes. They were originally defined in the history machine sense, as hidden Markov models whose states are the equivalence classes of infinite pasts with the same probability distribution over futures. In analyzing synchronization, though, an alternative generator definition was given: unifilar, edge-emitting hidden Markov models with probabilistically distinct states. The key difference is that history epsilon-machines are defined by a process, whereas generator epsilon-machines define a process. We show here that these two definitions are equivalent in the finite-state case.


MAP Complexity Results and Approximation Methods

arXiv.org Artificial Intelligence

MAP is the problem of finding a most probable instantiation of a set of nvariables in a Bayesian network, given some evidence. MAP appears to be a significantly harder problem than the related problems of computing the probability of evidence Pr, or MPE a special case of MAP. Because of the complexity of MAP, and the lack of viable algorithms to approximate it,MAP computations are generally avoided by practitioners. This paper investigates the complexity of MAP. We show that MAP is complete for NP. We also provide negative complexity results for elimination based algorithms. It turns out that MAP remains hard even when MPE, and Pr are easy. We show that MAP is NPcomplete when the networks are restricted to polytrees, and even then can not be effectively approximated. Because there is no approximation algorithm with guaranteed results, we investigate best effort approximations. We introduce a generic MAP approximation framework. As one instantiation of it, we implement local search coupled with belief propagation BP to approximate MAP. We show how to extract approximate evidence retraction information from belief propagation which allows us to perform efficient local search. This allows MAP approximation even on networks that are too complex to even exactly solve the easier problems of computing Pr or MPE. Experimental results indicate that using BP and local search provides accurate MAP estimates in many cases.


Finding Optimal Bayesian Networks

arXiv.org Artificial Intelligence

In this paper, we derive optimality results for greedy Bayesian-network search algorithms that perform single-edge modifications at each step and use asymptotically consistent scoring criteria. Our results extend those of Meek (1997) and Chickering (2002), who demonstrate that in the limit of large datasets, if the generative distribution is perfect with respect to a DAG defined over the observable variables, such search algorithms will identify this optimal (i.e. We relax their assumption about the generative distribution, and assume only that this distribution satisfies the composition property over the observable variables, which is a more realistic assumption for real domains. Under this assumption, we guarantee that the search algorithms identify an inclusion-optimal model; that is, a model that (1) contains the generative distribution and (2) has no sub-model that contains this distribution. In addition, we show that the composition property is guaranteed to hold whenever the dependence relationships in the generative distribution can be characterized by paths between singleton elements in some generative graphical model (e.g. a DAG, a chain graph, or a Markov network) even when the generative model includes unobserved variables, and even when the observed data is subject to selection bias. Introduction The problem of learning Bayesian networks (a.k.a directed graphical models) from data has received much attention in the UAI community. A simple approach taken by many researchers, particularly those contributing experimental papers, is to apply--in conjunction with a scoring criterion--a greedy single-edge search algorithm to the space of Bayesian-network structures or to the space of equivalence classes of those structures. There are a number of important reasons for the popularity of this approach.


Continuation Methods for Mixing Heterogenous Sources

arXiv.org Machine Learning

A number of modern learning tasks involve estimation from heterogeneous information sources. This includes classification with labeled and unlabeled data as well as other problems with analogous structure such as competitive (game theoretic) problems. The associated estimation problems can be typically reduced to solving a set of fixed point equations (consistency conditions). We introduce a general method for combining a preferred information source with another in this setting by evolving continuous paths of fixed points at intermediate allocations. We explicitly identify critical points along the unique paths to either increase the stability of estimation or to ensure a significant departure from the initial source. The homotopy continuation approach is guaranteed to terminate at the second source, and involves no combinatorial effort. We illustrate the power of these ideas both in classification tasks with labeled and unlabeled data, as well as in the context of a competitive (min-max) formulation of DNA sequence motif discovery.


Coordinates: Probabilistic Forecasting of Presence and Availability

arXiv.org Artificial Intelligence

We present methods employed in COORDINATE, a prototype service that supports collaboration and communication by learning predictive models that provide forecasts of users' presence and availability. We describe how data is collected about user activity and proximity from multiple devices, in addition to analysis of the content of users' calendars, the time of day, and day of week. We review applications of presence forecasting embedded in the PRIORITIES application and then present details of the COORDINATE service that was informed by the earlier efforts.