Goto

Collaborating Authors

 bdeu


Reviews: Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Neural Information Processing Systems

The proposed method is very similar to previous work by Nie et al. -- both use k-trees to search for low-treewidth Bayesian networks, both start with a randomly chosen initial clique, and both propose using an A* method for finding the best tree. The differences are that Nie et al. score k-trees using a mutual information score and use BDeu for choosing the final consistent Bayesian network, while this paper proposes using BIC and incrementally building the Bayesian network along with the k-tree, using the BN to score the k-tree. This paper also includes the additional restriction that the complete variable (partial) order is chosen randomly, while in Nie et al. The main justification for these differences is the ability to scale to large treewidths. However, in the experiments, the previous S2 algorithm also can scale to large treewidths.


Learning All Credible Bayesian Network Structures for Model Averaging

arXiv.org Artificial Intelligence

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known score-and-search approach. However, selecting a single model (i.e., the best scoring BN) can be misleading or may not achieve the best possible accuracy. An alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging, where the space of possible BNs is sampled or enumerated in some fashion. Unfortunately, existing approaches for model averaging either severely restrict the structure of the Bayesian network or have only been shown to scale to networks with fewer than 30 random variables. In this paper, we propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches.


Structure Learning from Related Data Sets with a Hierarchical Bayesian Score

arXiv.org Machine Learning

Score functions for learning the structure of Bayesian networks in the literature assume that data are a homogeneous set of observations; whereas it is often the case that they comprise different related, but not homogeneous, data sets collected in different ways. In this paper we propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD). The proposed score is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. We derive a closed-form expression for BHD using a variational approximation of the marginal likelihood and we study its performance using simulated data. We find that, when data comprise multiple related data sets, BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance, and that it is as accurate as BDeu when data are homogeneous. Moreover, the estimated networks are sparser and therefore more interpretable than those obtained with BDeu, thanks to a lower number of false positive arcs.


Finding All Bayesian Network Structures within a Factor of Optimal

arXiv.org Artificial Intelligence

A Bayesian network is a widely used probabilistic graphical model with applications in knowledge discovery and prediction. Learning a Bayesian network (BN) from data can be cast as an optimization problem using the well-known score-and-search approach. However, selecting a single model (i.e., the best scoring BN) can be misleading or may not achieve the best possible accuracy. An alternative to committing to a single model is to perform some form of Bayesian or frequentist model averaging, where the space of possible BNs is sampled or enumerated in some fashion. Unfortunately, existing approaches for model averaging either severely restrict the structure of the Bayesian network or have only been shown to scale to networks with fewer than 30 random variables. In this paper, we propose a novel approach to model averaging inspired by performance guarantees in approximation algorithms. Our approach has two primary advantages. First, our approach only considers credible models in that they are optimal or near-optimal in score. Second, our approach is more efficient and scales to significantly larger Bayesian networks than existing approaches.


Dirichlet Bayesian Network Scores and the Maximum Relative Entropy Principle

arXiv.org Machine Learning

A classic approach for learning Bayesian networks from data is to identify a maximum a posteriori (MAP) network structure. In the case of discrete Bayesian networks, MAP networks are selected by maximising one of several possible Bayesian Dirichlet (BD) scores; the most famous is the Bayesian Dirichlet equivalent uniform (BDeu) score from Heckerman et al (1995). The key properties of BDeu arise from its uniform prior over the parameters of each local distribution in the network, which makes structure learning computationally efficient; it does not require the elicitation of prior knowledge from experts; and it satisfies score equivalence. In this paper we will review the derivation and the properties of BD scores, and of BDeu in particular, and we will link them to the corresponding entropy estimates to study them from an information theoretic perspective. To this end, we will work in the context of the foundational work of Giffin and Caticha (2007), who showed that Bayesian inference can be framed as a particular case of the maximum relative entropy principle. We will use this connection to show that BDeu should not be used for structure learning from sparse data, since it violates the maximum relative entropy principle; and that it is also problematic from a more classic Bayesian model selection perspective, because it produces Bayes factors that are sensitive to the value of its only hyperparameter. Using a large simulation study, we found in our previous work (Scutari, 2016) that the Bayesian Dirichlet sparse (BDs) score seems to provide better accuracy in structure learning; in this paper we further show that BDs does not suffer from the issues above, and we recommend to use it for sparse data instead of BDeu. Finally, will show that these issues are in fact different aspects of the same problem and a consequence of the distributional assumptions of the prior.


Beyond Uniform Priors in Bayesian Network Structure Learning

arXiv.org Machine Learning

Bayesian network structure learning is often performed in a Bayesian setting, evaluating candidate structures using their posterior probabilities for a given data set. Score-based algorithms then use those posterior probabilities as an objective function and return the maximum a posteriori network as the learned model. For discrete Bayesian networks, the canonical choice for a posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal likelihood with a uniform (U) graph prior, which assumes a uniform prior both on the network structures and on the parameters of the networks. In this paper, we investigate the problems arising from these assumptions, focusing on those caused by small sample sizes and sparse data. We then propose an alternative posterior score: the Bayesian Dirichlet sparse (BDs) marginal likelihood with a marginal uniform (MU) graph prior. Like U BDeu, MU BDs does not require any prior information on the probabilistic structure of the data and can be used as a replacement noninformative score. We study its theoretical properties and we evaluate its performance in an extensive simulation study, showing that MU BDs is both more accurate than U BDeu in learning the structure of the network and competitive in predicting power, while not being computationally more complex to estimate.


An Empirical-Bayes Score for Discrete Bayesian Networks

arXiv.org Machine Learning

Bayesian network structure learning is often performed in a Bayesian setting, by evaluating candidate structures using their posterior probabilities for a given data set. Score-based algorithms then use those posterior probabilities as an objective function and return the maximum a posteriori network as the learned model. For discrete Bayesian networks, the canonical choice for a posterior score is the Bayesian Dirichlet equivalent uniform (BDeu) marginal likelihood with a uniform (U) graph prior (Heckerman et al., 1995). Its favourable theoretical properties descend from assuming a uniform prior both on the space of the network structures and on the space of the parameters of the network. In this paper, we revisit the limitations of these assumptions; and we introduce an alternative set of assumptions and the resulting score: the Bayesian Dirichlet sparse (BDs) empirical Bayes marginal likelihood with a marginal uniform (MU) graph prior. We evaluate its performance in an extensive simulation study, showing that MU+BDs is more accurate than U+BDeu both in learning the structure of the network and in predicting new observations, while not being computationally more complex to estimate.


A Bayesian Network Scoring Metric That Is Based On Globally Uniform Parameter Priors

arXiv.org Artificial Intelligence

We introduce a new Bayesian network (BN) scoring metric called the Global Uniform (GU) metric. This metric is based on a particular type of default parameter prior. Such priors may be useful when a BN developer is not willing or able to specify domain-specific parameter priors. The GU parameter prior specifies that every prior joint probability distribution P consistent with a BN structure S is considered to be equally likely. Distribution P is consistent with S if P includes just the set of independence relations defined by S. We show that the GU metric addresses some undesirable behavior of the BDeu and K2 Bayesian network scoring metrics, which also use particular forms of default parameter priors. A closed form formula for computing GU for special classes of BNs is derived. Efficiently computing GU for an arbitrary BN remains an open problem.


Learning networks determined by the ratio of prior and data

arXiv.org Machine Learning

Recent reports have described that the equivalent sample size (ESS) in a Dirichlet prior plays an important role in learning Bayesian networks. This paper provides an asymptotic analysis of the marginal likelihood score for a Bayesian network. Results show that the ratio of the ESS and sample size determine the penalty of adding arcs in learning Bayesian networks. The number of arcs increases monotonically as the ESS increases; the number of arcs monotonically decreases as the ESS decreases. Furthermore, the marginal likelihood score provides a unified expression of various score metrics by changing prior knowledge.


Robust learning Bayesian networks for prior belief

arXiv.org Machine Learning

In addition, the Dirichlet prior is known as a distribution that ensures likelihood equivalence; this score is known as \Bayesian Dirichlet equivalence (BDe)" (Heckerman et al., 1995). Given no prior knowledge, the Bayesian Dirichlet equivalence uniform (BDeu), as proposed earlier by Buntine (1991), is often used. Actually, BDe(u) requires an \equivalent sample size (ESS)", which re ects the degree of a user's prior belief. Moreover, recent studies have demonstrated that ESS plays an important role in the resulting network structure estimate. Steck and Jaakkola (2002) demonstrated that the deletion of an arc in a Bayesian network is more likely to occur as ESS goes asymptotically to zero for a large sample.