Goto

Collaborating Authors

 Bayesian Learning


Learning Continuous Time Bayesian Networks

arXiv.org Machine Learning

Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning parameters and structure of a CTBN from fully observed data. We define a conjugate prior for CTBNs, and show how it can be used both for Bayesian parameter estimation and as the basis of a Bayesian score for structure learning. Because acyclicity is not a constraint in CTBNs, we can show that the structure learning problem is significantly easier, both in theory and in practice, than structure learning for dynamic Bayesian networks (DBNs). Furthermore, as CTBNs can tailor the parameters and dependency structure to the different time granularities of the evolution of different variables, they can provide a better fit to continuous-time processes than DBNs with a fixed time granularity.


Automated Analytic Asymptotic Evaluation of the Marginal Likelihood for Latent Models

arXiv.org Machine Learning

We present and implement two algorithms for analytic asymptotic evaluation of the marginal likelihood of data given a Bayesian network with hidden nodes. As shown by previous work, this evaluation is particularly hard for latent Bayesian network models, namely networks that include hidden variables, where asymptotic approximation deviates from the standard BIC score. Our algorithms solve two central difficulties in asymptotic evaluation of marginal likelihood integrals, namely, evaluation of regular dimensionality drop for latent Bayesian network models and computation of non-standard approximation formulas for singular statistics for these models. The presented algorithms are implemented in Matlab and Maple and their usage is demonstrated for marginal likelihood approximations for Bayesian networks with hidden variables.


Learning Riemannian Metrics

arXiv.org Machine Learning

We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given dataset of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pullback metrics of the Fisher information under a continuous group of transformations. When applied to documents, the resulting geodesics resemble, but outperform, the TFIDF cosine similarity measure in classification.


Budgeted Learning of Naive-Bayes Classifiers

arXiv.org Machine Learning

There is almost always a cost associated with acquiring training data. We consider the situation where the learner, with a fixed budget, may'purchase' data during training. In particular, we examine the case where observing the value of a feature of a training example has an associated cost, and the total cost of all feature values acquired during training must remain less than this fixed budget. This paper compares methods for sequentially choosing which feature value to purchase next, given the budget and user's current knowledge of Na'ive Bayes model parameters. Whereas active learning has traditionally focused on myopic (greedy) approaches and uniform/round-robin policies for query selection, this paper shows that such methods are often suboptimal and presents a tractable method for incorporating knowledge of the budget in the information acquisition process.


A New Algorithm for Maximum Likelihood Estimation in Gaussian Graphical Models for Marginal Independence

arXiv.org Machine Learning

Graphical models with bi-directed edges (<->) represent marginal independence: the absence of an edge between two vertices indicates that the corresponding variables are marginally independent. In this paper, we consider maximum likelihood estimation in the case of continuous variables with a Gaussian joint distribution, sometimes termed a covariance graph model. We present a new fitting algorithm which exploits standard regression techniques and establish its convergence properties. Moreover, we contrast our procedure to existing estimation methods.


The Information Bottleneck EM Algorithm

arXiv.org Machine Learning

Learning with hidden variables is a central challenge in probabilistic graphical models that has important implications for many real-life problems. The classical approach is using the Expectation Maximization (EM) algorithm. This algorithm, however, can get trapped in local maxima. In this paper we explore a new approach that is based on the Information Bottleneck principle. In this approach, we view the learning problem as a tradeoff between two information theoretic objectives. The first is to make the hidden variables uninformative about the identity of specific instances. The second is to make the hidden variables informative about the observed attributes. By exploring different tradeoffs between these two objectives, we can gradually converge on a high-scoring solution. As we show, the resulting, Information Bottleneck Expectation Maximization (IB-EM) algorithm, manages to find solutions that are superior to standard EM methods.


Bayesian Hierarchical Mixtures of Experts

arXiv.org Machine Learning

The Hierarchical Mixture of Experts (HME) is a well-known tree-based model for regression and classification, based on soft probabilistic splits. In its original formulation it was trained by maximum likelihood, and is therefore prone to over-fitting. Furthermore the maximum likelihood framework offers no natural metric for optimizing the complexity and structure of the tree. Previous attempts to provide a Bayesian treatment of the HME model have relied either on ad-hoc local Gaussian approximations or have dealt with related models representing the joint distribution of both input and output variables. In this paper we describe a fully Bayesian treatment of the HME model based on variational inference. By combining local and global variational methods we obtain a rigourous lower bound on the marginal probability of the data under the model. This bound is optimized during the training phase, and its resulting value can be used for model order selection. We present results using this approach for a data set describing robot arm kinematics.


Active Collaborative Filtering

arXiv.org Machine Learning

Collaborative filtering (CF) allows the preferences of multiple users to be pooled to make recommendations regarding unseen products. We consider in this paper the problem of online and interactive CF: given the current ratings associated with a user, what queries (new ratings) would most improve the quality of the recommendations made? We cast this terms of expected value of information (EVOI); but the online computational cost of computing optimal queries is prohibitive. We show how offline prototyping and computation of bounds on EVOI can be used to dramatically reduce the required online computation. The framework we develop is general, but we focus on derivations and empirical study in the specific case of the multiple-cause vector quantization model.


CLP(BN): Constraint Logic Programming for Probabilistic Knowledge

arXiv.org Artificial Intelligence

We present CLP(BN), a novel approach that aims at expressing Bayesian networks through the constraint logic programming framework. Arguably, an important limitation of traditional Bayesian networks is that they are propositional, and thus cannot represent relations between multiple similar objects in multiple contexts. Several researchers have thus proposed first-order languages to describe such networks. Namely, one very successful example of this approach are the Probabilistic Relational Models (PRMs), that combine Bayesian networks with relational database technology. The key difficulty that we had to address when designing CLP(cal{BN}) is that logic based representations use ground terms to denote objects. With probabilitic data, we need to be able to uniquely represent an object whose value we are not sure about. We use {sl Skolem functions} as unique new symbols that uniquely represent objects with unknown value. The semantics of CLP(cal{BN}) programs then naturally follow from the general framework of constraint logic programming, as applied to a specific domain where we have probabilistic data. This paper introduces and defines CLP(cal{BN}), and it describes an implementation and initial experiments. The paper also shows how CLP(cal{BN}) relates to Probabilistic Relational Models (PRMs), Ngo and Haddawys Probabilistic Logic Programs, AND Kersting AND De Raedts Bayesian Logic Programs.


Efficient Inference in Large Discrete Domains

arXiv.org Artificial Intelligence

In this paper we examine the problem of inference in Bayesian Networks with discrete random variables that have very large or even unbounded domains. For example, in a domain where we are trying to identify a person, we may have variables that have as domains, the set of all names, the set of all postal codes, or the set of all credit card numbers. We cannot just have big tables of the conditional probabilities, but need compact representations. We provide an inference algorithm, based on variable elimination, for belief networks containing both large domain and normal discrete random variables. We use intensional (i.e., in terms of procedures) and extensional (in terms of listing the elements) representations of conditional probabilities and of the intermediate factors.