Directed Networks
Learning Measurement Models for Unobserved Variables
Silva, Ricardo, Scheines, Richard, Glymour, Clark, Spirtes, Peter L.
Observed associations in a database may be due in whole or part to variations in unrecorded ("latent") variables. Identifying such variables and their causal relationships with one another is a principal goal in many scientific and practical domains. Previous work shows that, given a partition of observed variables such that members of a class share only a single latent common cause, standard search algorithms for causal Bayes nets can infer structural relations between latent variables. We introduce an algorithm for discovering such partitions when they exist. Uniquely among available procedures, the algorithm is (asymptotically) correct under standard assumptions in causal Bayes net search algorithms, requires no prior knowledge of the number of latent variables, and does not depend on the mathematical form of the relationships among the latent variables. We evaluate the algorithm on a variety of simulated data sets.
Locally Weighted Naive Bayes
Frank, Eibe, Hall, Mark, Pfahringer, Bernhard
Despite its simplicity, the naive Bayes classifier has surprised machine learning researchers by exhibiting good performance on a variety of learning problems. Encouraged by these results, researchers have looked to overcome naive Bayes primary weakness - attribute independence - and improve the performance of the algorithm. This paper presents a locally weighted version of naive Bayes that relaxes the independence assumption by learning local models at prediction time. Experimental results show that locally weighted naive Bayes rarely degrades accuracy compared to standard naive Bayes and, in many cases, improves accuracy dramatically. The main advantage of this method compared to other techniques for enhancing naive Bayes is its conceptual and computational simplicity.
Reasoning about Bayesian Network Classifiers
Bayesian network classifiers are used in many fields, and one common class of classifiers are naive Bayes classifiers. In this paper, we introduce an approach for reasoning about Bayesian network classifiers in which we explicitly convert them into Ordered Decision Diagrams (ODDs), which are then used to reason about the properties of these classifiers. Specifically, we present an algorithm for converting any naive Bayes classifier into an ODD, and we show theoretically and experimentally that this algorithm can give us an ODD that is tractable in size even given an intractable number of instances. Since ODDs are tractable representations of classifiers, our algorithm allows us to efficiently test the equivalence of two naive Bayes classifiers and characterize discrepancies between them. We also show a number of additional results including a count of distinct classifiers that can be induced by changing some CPT in a naive Bayes classifier, and the range of allowable changes to a CPT which keeps the current classifier unchanged.
Learning Module Networks
Segal, Eran, Pe'er, Dana, Regev, Aviv, Koller, Daphne, Friedman, Nir
Methods for learning Bayesian network structure can discover dependency structure between observed variables, and have been shown to be useful in many applications. However, in domains that involve a large number of variables, the space of possible network structures is enormous, making it difficult, for both computational and statistical reasons, to identify a good model. In this paper, we consider a solution to this problem, suitable for domains where many variables have similar behavior. Our method is based on a new class of models, which we call module networks. A module network explicitly represents the notion of a module - a set of variables that have the same parents in the network and share the same conditional probability distribution. We define the semantics of module networks, and describe an algorithm that learns a module network from data. The algorithm learns both the partitioning of the variables into modules and the dependency structure between the variables. We evaluate our algorithm on synthetic data, and on real data in the domains of gene expression and the stock market. Our results show that module networks generalize better than Bayesian networks, and that the learned module network structure reveals regularities that are obscured in learned Bayesian networks.
Stochastic complexity of Bayesian networks
Yamazaki, Keisuke, Watanbe, Sumio
Bayesian networks are now being used in enormous fields, for example, diagnosis of a system, data mining, clustering and so on. In spite of their wide range of applications, the statistical properties have not yet been clarified, because the models are nonidentifiable and non-regular. In a Bayesian network, the set of its parameter for a smaller model is an analytic set with singularities in the space of large ones. Because of these singularities, the Fisher information matrices are not positive definite. In other words, the mathematical foundation for learning was not constructed. In recent years, however, we have developed a method to analyze non-regular models using algebraic geometry. This method revealed the relation between the models singularities and its statistical properties. In this paper, applying this method to Bayesian networks with latent variables, we clarify the order of the stochastic complexities.Our result claims that the upper bound of those is smaller than the dimension of the parameter space. This means that the Bayesian generalization error is also far smaller than that of regular model, and that Schwarzs model selection criterion BIC needs to be improved for Bayesian networks.
Learning Continuous Time Bayesian Networks
Nodelman, Uri, Shelton, Christian R., Koller, Daphne
Continuous time Bayesian networks (CTBNs) describe structured stochastic processes with finitely many states that evolve over continuous time. A CTBN is a directed (possibly cyclic) dependency graph over a set of variables, each of which represents a finite state continuous time Markov process whose transition model is a function of its parents. We address the problem of learning parameters and structure of a CTBN from fully observed data. We define a conjugate prior for CTBNs, and show how it can be used both for Bayesian parameter estimation and as the basis of a Bayesian score for structure learning. Because acyclicity is not a constraint in CTBNs, we can show that the structure learning problem is significantly easier, both in theory and in practice, than structure learning for dynamic Bayesian networks (DBNs). Furthermore, as CTBNs can tailor the parameters and dependency structure to the different time granularities of the evolution of different variables, they can provide a better fit to continuous-time processes than DBNs with a fixed time granularity.
Automated Analytic Asymptotic Evaluation of the Marginal Likelihood for Latent Models
We present and implement two algorithms for analytic asymptotic evaluation of the marginal likelihood of data given a Bayesian network with hidden nodes. As shown by previous work, this evaluation is particularly hard for latent Bayesian network models, namely networks that include hidden variables, where asymptotic approximation deviates from the standard BIC score. Our algorithms solve two central difficulties in asymptotic evaluation of marginal likelihood integrals, namely, evaluation of regular dimensionality drop for latent Bayesian network models and computation of non-standard approximation formulas for singular statistics for these models. The presented algorithms are implemented in Matlab and Maple and their usage is demonstrated for marginal likelihood approximations for Bayesian networks with hidden variables.
Learning Riemannian Metrics
We consider the problem of learning a Riemannian metric associated with a given differentiable manifold and a set of points. Our approach to the problem involves choosing a metric from a parametric family that is based on maximizing the inverse volume of a given dataset of points. From a statistical perspective, it is related to maximum likelihood under a model that assigns probabilities inversely proportional to the Riemannian volume element. We discuss in detail learning a metric on the multinomial simplex where the metric candidates are pullback metrics of the Fisher information under a continuous group of transformations. When applied to documents, the resulting geodesics resemble, but outperform, the TFIDF cosine similarity measure in classification.
Budgeted Learning of Naive-Bayes Classifiers
Lizotte, Daniel J., Madani, Omid, Greiner, Russell
There is almost always a cost associated with acquiring training data. We consider the situation where the learner, with a fixed budget, may'purchase' data during training. In particular, we examine the case where observing the value of a feature of a training example has an associated cost, and the total cost of all feature values acquired during training must remain less than this fixed budget. This paper compares methods for sequentially choosing which feature value to purchase next, given the budget and user's current knowledge of Na'ive Bayes model parameters. Whereas active learning has traditionally focused on myopic (greedy) approaches and uniform/round-robin policies for query selection, this paper shows that such methods are often suboptimal and presents a tractable method for incorporating knowledge of the budget in the information acquisition process.
A New Algorithm for Maximum Likelihood Estimation in Gaussian Graphical Models for Marginal Independence
Drton, Mathias, Richardson, Thomas S.
Graphical models with bi-directed edges (<->) represent marginal independence: the absence of an edge between two vertices indicates that the corresponding variables are marginally independent. In this paper, we consider maximum likelihood estimation in the case of continuous variables with a Gaussian joint distribution, sometimes termed a covariance graph model. We present a new fitting algorithm which exploits standard regression techniques and establish its convergence properties. Moreover, we contrast our procedure to existing estimation methods.