AITopics | Learning Graphical Models

Collaborating Authors

Learning Graphical Models

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

What Regularized Auto-Encoders Learn from the Data Generating Distribution

Alain, Guillaume, Bengio, Yoshua

arXiv.org Machine LearningAug-19-2014

What do auto-encoders learn about the underlying data generating distribution? Recent work suggests that some auto-encoder variants do a good job of capturing the local manifold structure of data. This paper clarifies some of these previous observations by showing that minimizing a particular form of regularized reconstruction error yields a reconstruction function that locally characterizes the shape of the data generating density. We show that the auto-encoder captures the score (derivative of the log-density with respect to the input). It contradicts previous interpretations of reconstruction error as an energy function. Unlike previous results, the theorems provided here are completely generic and do not depend on the parametrization of the auto-encoder: they show what the auto-encoder would tend to if given enough capacity and examples. These results are for a contractive training criterion we show to be similar to the denoising auto-encoder training criterion with small corruption noise, but with contraction applied on the whole reconstruction function rather than just encoder. Similarly to score matching, one can consider the proposed training criterion as a convenient alternative to maximum likelihood because it does not involve a partition function. Finally, we show how an approximate Metropolis-Hastings MCMC can be setup to recover samples from the estimated distribution, and this is confirmed in sampling experiments.

artificial intelligence, machine learning, manifold, (18 more...)

arXiv.org Machine Learning

1211.4246

Country: North America > Canada (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback

PGMHD: A Scalable Probabilistic Graphical Model for Massive Hierarchical Data Problems

AlJadda, Khalifeh, Korayem, Mohammed, Ortiz, Camilo, Grainger, Trey, Miller, John A., York, William S.

arXiv.org Artificial IntelligenceAug-19-2014

In the big data era, scalability has become a crucial requirement for any useful computational model. Probabilistic graphical models are very useful for mining and discovering data insights, but they are not scalable enough to be suitable for big data problems. Bayesian Networks particularly demonstrate this limitation when their data is represented using few random variables while each random variable has a massive set of values. With hierarchical data - data that is arranged in a treelike structure with several levels - one would expect to see hundreds of thousands or millions of values distributed over even just a small number of levels. When modeling this kind of hierarchical data across large data sets, Bayesian networks become infeasible for representing the probability distributions for the following reasons: i) Each level represents a single random variable with hundreds of thousands of values, ii) The number of levels is usually small, so there are also few random variables, and iii) The structure of the network is predefined since the dependency is modeled top-down from each parent to each of its child nodes, so the network would contain a single linear path for the random variables from each parent to each child node. In this paper we present a scalable probabilistic graphical model to overcome these limitations for massive hierarchical data. We believe the proposed model will lead to an easily-scalable, more readable, and expressive implementation for problems that require probabilistic-based solutions for massive amounts of hierarchical data. We successfully applied this model to solve two different challenging probabilistic-based problems on massive hierarchical data sets for different domains, namely, bioinformatics and latent semantic discovery over search logs.

bayesian network, data mining, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1407.5656

Country: North America > United States > Georgia > Clarke County > Athens (0.14)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.96)

Add feedback

Bayesian image segmentations by Potts prior and loopy belief propagation

Tanaka, Kazuyuki, Kataoka, Shun, Yasuda, Muneki, Waizumi, Yuji, Hsu, Chiou-Ting

arXiv.org Machine LearningAug-18-2014

This paper presents a Bayesian image segmentation model based on Potts prior and loopy belief propagation. The proposed Bayesian model involves several terms, including the pairwise interactions of Potts models, and the average vectors and covariant matrices of Gauss distributions in color image modeling. These terms are often referred to as hyperparameters in statistical machine learning theory. In order to determine these hyperparameters, we propose a new scheme for hyperparameter estimation based on conditional maximization of entropy in the Potts prior. The algorithm is given based on loopy belief propagation. In addition, we compare our conditional maximum entropy framework with the conventional maximum likelihood framework, and also clarify how the first order phase transitions in LBP's for Potts models influence our hyperparameter estimation procedures.

artificial intelligence, bayesian inference, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.7566/JPSJ.83.124002

1404.3012

Country:

Asia > Japan (0.28)
North America > United States > California (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)

Genre: Research Report (0.40)

Industry: Education (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Addendum on the scoring of Gaussian directed acyclic graphical models

Kuipers, Jack, Moffa, Giusi, Heckerman, David

arXiv.org Machine LearningAug-14-2014

Where Pa, are the parent variables of the vertex i and dY is the data restricted to the coordinates in Y Q X. A Bayesian approach to structure discovery in Bayesian networks.

artificial intelligence, geiger and heckerman, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1214/14-AOS1217

1402.6863

Country: North America > United States > California > Los Angeles County > Los Angeles (0.15)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Convergence rate of Bayesian tensor estimator: Optimal rate without restricted strong convexity

Suzuki, Taiji

arXiv.org Machine LearningAug-13-2014

In this paper, we investigate the statistical convergence rate of a Bayesian low-rank tensor estimator. Our problem setting is the regression problem where a tensor structure underlying the data is estimated. This problem setting occurs in many practical applications, such as collaborative filtering, multi-task learning, and spatio-temporal data analysis. The convergence rate is analyzed in terms of both in-sample and out-of-sample predictive accuracies. It is shown that a near optimal rate is achieved without any strong convexity of the observation. Moreover, we show that the method has adaptivity to the unknown rank of the true tensor, that is, the near optimal rate depending on the true rank is achieved even if it is not known a priori.

artificial intelligence, machine learning, tensor, (19 more...)

arXiv.org Machine Learning

1408.3092

Country: Asia > Japan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Marginal Likelihoods for Distributed Parameter Estimation of Gaussian Graphical Models

Meng, Zhaoshi, Wei, Dennis, Wiesel, Ami, Hero, Alfred O. III

arXiv.org Machine LearningAug-13-2014

We consider distributed estimation of the inverse covariance matrix, also called the concentration or precision matrix, in Gaussian graphical models. Traditional centralized estimation often requires global inference of the covariance matrix, which can be computationally intensive in large dimensions. Approximate inference based on message-passing algorithms, on the other hand, can lead to unstable and biased estimation in loopy graphical models. In this paper, we propose a general framework for distributed estimation based on a maximum marginal likelihood (MML) approach. This approach computes local parameter estimates by maximizing marginal likelihoods defined with respect to data collected from local neighborhoods. Due to the non-convexity of the MML problem, we introduce and solve a convex relaxation. The local estimates are then combined into a global estimate without the need for iterative message-passing between neighborhoods. The proposed algorithm is naturally parallelizable and computationally efficient, thereby making it suitable for high-dimensional problems. In the classical regime where the number of variables $p$ is fixed and the number of samples $T$ increases to infinity, the proposed estimator is shown to be asymptotically consistent and to improve monotonically as the local neighborhood size increases. In the high-dimensional scaling regime where both $p$ and $T$ increase to infinity, the convergence rate to the true parameters is derived and is seen to be comparable to centralized maximum likelihood estimation. Extensive numerical experiments demonstrate the improved performance of the two-hop version of the proposed estimator, which suffices to almost close the gap to the centralized maximum likelihood estimator at a reduced computational cost.

artificial intelligence, estimator, machine learning, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2014.2350956

1303.4756

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.74)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.74)

Add feedback

Generalization and Robustness of Batched Weighted Average Algorithm with V-geometrically Ergodic Markov Data

Cuong, Nguyen Viet, Ho, Lam Si Tung, Dinh, Vu

arXiv.org Machine LearningAug-12-2014

We analyze the generalization and robustness of the batched weighted average algorithm for V-geometrically ergodic Markov data. This algorithm is a good alternative to the empirical risk minimization algorithm when the latter suffers from overfitting or when optimizing the empirical risk is hard. For the generalization of the algorithm, we prove a PAC-style bound on the training sample size for the expected $L_1$-loss to converge to the optimal loss when training data are V-geometrically ergodic Markov chains. For the robustness, we show that if the training target variable's values contain bounded noise, then the generalization bound of the algorithm deviates at most by the range of the noise. Our results can be applied to the regression problem, the classification problem, and the case where there exists an unknown deterministic target hypothesis.

algorithm, artificial intelligence, machine learning, (16 more...)

arXiv.org Machine Learning

1406.3166

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.41)

Add feedback

Comparing Nonparametric Bayesian Tree Priors for Clonal Reconstruction of Tumors

Deshwar, Amit G., Vembu, Shankar, Morris, Quaid

arXiv.org Machine LearningAug-11-2014

Statistical machine learning methods, especially nonparametric Bayesian methods, have become increasingly popular to infer clonal population structure of tumors. Here we describe the treeCRP, an extension of the Chinese restaurant process (CRP), a popular construction used in nonparametric mixture models, to infer the phylogeny and genotype of major subclonal lineages represented in the population of cancer cells. We also propose new split-merge updates tailored to the subclonal reconstruction problem that improve the mixing time of Markov chains. In comparisons with the tree-structured stick breaking prior used in PhyloSub, we demonstrate superior mixing and running time using the treeCRP with our new split-merge procedures. We also show that given the same number of samples, TSSB and treeCRP have similar ability to recover the subclonal structure of a tumor.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1408.2552

Country: North America > Canada > Ontario > Toronto (0.15)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Oncology > Leukemia (1.00)
Health & Medicine > Therapeutic Area > Hematology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Musculoskeletal (0.94)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Gaussian Process Structural Equation Models with Latent Variables

Silva, Ricardo, Gramacy, Robert B.

arXiv.org Machine LearningAug-9-2014

In a variety of disciplines such as social sciences, psychology, medicine and economics, the recorded data are considered to be noisy measurements of latent variables connected by some causal structure. This corresponds to a family of graphical models known as the structural equation model with latent variables. While linear non-Gaussian variants have been well-studied, inference in nonparametric structural equation models is still underdeveloped. We introduce a sparse Gaussian process parameterization that defines a non-linear structure connecting latent variables, unlike common formulations of Gaussian process latent variable models. The sparse parameterization is given a full Bayesian treatment without compromising Markov chain Monte Carlo efficiency. We compare the stability of the sampling procedure and the predictive ability of the model against the current practice.

indicator, latent variable, structural equation model, (11 more...)

arXiv.org Machine Learning

1408.2042

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
South America > Paraguay > Asunción > Asunción (0.05)
Europe > Greece (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Conditional Probability Tree Estimation Analysis and Algorithms

Beygelzimer, Alina, Langford, John, Lifshits, Yuri, Sorkin, Gregory, Strehl, Alexander L.

arXiv.org Machine LearningAug-9-2014

We consider the problem of estimating the conditional probability of a label in time O(log n), where n is the number of possible labels. We analyze a natural reduction of this problem to a set of binary regression problems organized in a tree structure, proving a regret bound that scales with the depth of the tree. Motivated by this analysis, we propose the first online algorithm which provably constructs a logarithmic depth tree on the set of labels to solve this problem. We test the algorithm empirically, showing that it works succesfully on a dataset with roughly 106 labels.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Machine Learning

1408.2031

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.63)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)

Add feedback