AITopics | Learning Graphical Models

Collaborating Authors

Learning Graphical Models

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

On learning parametric-output HMMs

Kontorovich, Aryeh, Nadler, Boaz, Weiss, Roi

arXiv.org Machine LearningFeb-25-2013

Hidden Markov Models (HMM) are a standard tool in the modeling and analysis of time series with a wide variety of applications. When the number of hidden states is known, the standard method for estimating the HMM parameters from given observed data is the Baum-Welch algorithm [Baum et al., 1970]. The latter is known to suffer from two serious drawbacks: it 1 tends to converge (i) very slowly and (ii) only to a local maximum. Indeed, the problem of recovering the parameters of a general HMM is provably hard, in several distinct senses [Abe and Warmuth, 1992, Lyngsø and Pedersen, 2001, Terwijn, 2002]. In this paper we consider learning parametric-output HMMs with a finite and known number of hidden states, where the output from each hidden state follows a parametric distribution from a given family.

artificial intelligence, machine learning, output parameter, (18 more...)

arXiv.org Machine Learning

1302.6009

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Sparse Penalty in Deep Belief Networks: Using the Mixed Norm Constraint

Halkias, Xanadu, Paris, Sebastien, Glotin, Herve

arXiv.org Machine LearningFeb-22-2013

Deep Belief Networks (DBN) have been successfully applied on popular machine learning tasks. Specifically, when applied on hand-written digit recognition, DBNs have achieved approximate accuracy rates of 98.8%. In an effort to optimize the data representation achieved by the DBN and maximize their descriptive power, recent advances have focused on inducing sparse constraints at each layer of the DBN. In this paper we present a theoretical approach for sparse constraints in the DBN using the mixed norm for both non-overlapping and overlapping groups. We explore how these constraints affect the classification accuracy for digit recognition in three different datasets (MNIST, USPS, RIMES) and provide initial estimations of their usefulness by altering different parameters such as the group size and overlap percentage.

activation probability, artificial intelligence, machine learning, (15 more...)

arXiv.org Machine Learning

1301.3533

Country: North America > United States (0.37)

Genre: Research Report (0.40)

Industry:

Government > Post Office (0.51)
Government > Regional Government > North America Government > United States Government (0.37)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.84)

Add feedback

High-Dimensional Probability Estimation with Deep Density Models

Rippel, Oren, Adams, Ryan Prescott

arXiv.org Machine LearningFeb-20-2013

One of the fundamental problems in machine learning is the estimation of a probability distribution from data. Many techniques have been proposed to study the structure of data, most often building around the assumption that observations lie on a lower-dimensional manifold of high probability. It has been more difficult, however, to exploit this insight to build explicit, tractable density models for high-dimensional data. In this paper, we introduce the deep density model (DDM), a new approach to density estimation. We exploit insights from deep learning to construct a bijective map to a representation space, under which the transformation of the distribution of the data is approximately factorized and has identical and known marginal densities. The simplicity of the latent distribution under the model allows us to feasibly explore it, and the invertibility of the map to characterize contraction of measure across it. This enables us to compute normalized densities for out-of-sample data. This combination of tractability and flexibility allows us to tackle a variety of probabilistic tasks on high-dimensional datasets, including: rapid computation of normalized densities at test-time without evaluating a partition function; generation of samples without MCMC; and characterization of the joint entropy of the data.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Machine Learning

1302.5125

Country: North America (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Estimating Continuous Distributions in Bayesian Classifiers

John, George H., Langley, Pat

arXiv.org Machine LearningFeb-20-2013

When modeling a probability distribution with a Bayesian network, we are faced with the problem of how to handle continuous variables. Most previous work has either solved the problem by discretizing, or assumed that the data are generated by a single Gaussian. In this paper we abandon the normality assumption and instead use statistical methods for nonparametric density estimation. For a naive Bayesian classifier, we present experimental results on a variety of natural and artificial domains, comparing two methods of density estimation: assuming normality and modeling each conditional distribution with a single Gaussian; and using nonparametric kernel density estimation. We observe large reductions in error on several natural and artificial data sets, which suggests that kernel estimation is a useful tool for learning Bayesian models.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1302.4964

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.83)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

HUGS: Combining Exact Inference and Gibbs Sampling in Junction Trees

Kjærulff, Uffe

arXiv.org Artificial IntelligenceFeb-20-2013

Dawid, Kjaerulff and Lauritzen (1994) provided a preliminary description of a hybrid between Monte-Carlo sampling methods and exact local computations in junction trees. Utilizing the strengths of both methods, such hybrid inference methods has the potential of expanding the class of problems which can be solved under bounded resources as well as solving problems which otherwise resist exact solutions. The paper provides a detailed description of a particular instance of such a hybrid scheme; namely, combination of exact inference and Gibbs sampling in discrete Bayesian networks. We argue that this combination calls for an extension of the usual message passing scheme of ordinary junction trees.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1302.4968

Country: Europe > Denmark (0.15)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Stochastic Simulation Algorithms for Dynamic Probabilistic Networks

Kanazawa, Keiji, Koller, Daphne, Russell, Stuart

arXiv.org Artificial IntelligenceFeb-20-2013

Stochastic simulation algorithms such as likelihood weighting often give fast, accurate approximations to posterior probabilities in probabilistic networks, and are the methods of choice for very large networks. Unfortunately, the special characteristics of dynamic probabilistic networks (DPNs), which are used to represent stochastic temporal processes, mean that standard simulation algorithms perform very poorly. In essence, the simulation trials diverge further and further from reality as the process is observed over time. In this paper, we present simulation algorithms that use the evidence observed at each time step to push the set of trials back towards reality. The first algorithm, "evidence reversal" (ER) restructures each time slice of the DPN so that the evidence nodes for the slice become ancestors of the state variables. The second algorithm, called "survival of the fittest" sampling (SOF), "repopulates" the set of trials at each time step using a stochastic reproduction rate weighted by the likelihood of the evidence according to each trial. We compare the performance of each algorithm with likelihood weighting on the original network, and also investigate the benefits of combining the ER and SOF methods. The ER/SOF combination appears to maintain bounded error independent of the number of time steps in the simulation.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Artificial Intelligence

1302.4965

Country: North America > United States > California (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.69)

Add feedback

Optimal Discriminant Functions Based On Sampled Distribution Distance for Modulation Classification

Urriza, Paulo, Rebeiz, Eric, Cabric, Danijela

arXiv.org Machine LearningFeb-19-2013

In this letter, we derive the optimal discriminant functions for modulation classification based on the sampled distribution distance. The proposed method classifies various candidate constellations using a low complexity approach based on the distribution distance at specific testpoints along the cumulative distribution function. This method, based on the Bayesian decision criteria, asymptotically provides the minimum classification error possible given a set of testpoints. Testpoint locations are also optimized to improve classification performance. The method provides significant gains over existing approaches that also use the distribution of the signal features.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/LCOMM.2013.082113.131131

1302.4773

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)

Add feedback

Gaussian Process Vine Copulas for Multivariate Dependence

Lopez-Paz, David, Hernández-Lobato, José Miguel, Ghahramani, Zoubin

arXiv.org Machine LearningFeb-16-2013

Copulas allow to learn marginal distributions separately from the multivariate dependence structure (copula) that links them together into a density function. Vine factorizations ease the learning of high-dimensional copulas by constructing a hierarchy of conditional bivariate copulas. However, to simplify inference, it is common to assume that each of these conditional bivariate copulas is independent from its conditioning variables. In this paper, we relax this assumption by discovering the latent functions that specify the shape of a conditional copula given its conditioning variables. We learn these functions by following a Bayesian approach based on sparse Gaussian processes with expectation propagation for scalable, approximate inference. Experiments on real-world datasets show that, when modeling all conditional dependencies, we obtain better estimates of the underlying copula of the data.

bivariate copula, copula, gaussian process vine copula, (12 more...)

arXiv.org Machine Learning

1302.3979

Country:

South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > Colorado > Mesa County > Grand Junction (0.04)
(4 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Layer-wise learning of deep generative models

Arnold, Ludovic, Ollivier, Yann

arXiv.org Machine LearningFeb-16-2013

When using deep, multi-layered architectures to build generative models of data, it is difficult to train all layers at once. We propose a layer-wise training procedure admitting a performance guarantee compared to the global optimum. It is based on an optimistic proxy of future performance, the best latent marginal. We interpret auto-encoders in this setting as generative models, by showing that they train a lower bound of this criterion. We test the new learning procedure against a state of the art method (stacked RBMs), and find it to improve performance. Both theory and experiments highlight the importance, when training deep architectures, of using an inference model (from data to hidden variables) richer than the generative model (from hidden variables to data).

artificial intelligence, generative model, machine learning, (17 more...)

arXiv.org Machine Learning

1212.1524

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.42)

Add feedback

Density Ratio Hidden Markov Models

Quinn, John A., Sugiyama, Masashi

arXiv.org Machine LearningFeb-15-2013

Masashi Sugiyama Department of Computer Science Tokyo Institute of Technology Tokyo 152-8552, Japan sugi@cs.titech.ac.jp Abstract Hidden Markov models and their variants are the predominant sequential classification method in such domains as speech recognition, bioinformatics and natural language processing. Being generative rather than discriminative models, however, their classification performance is a drawback. In this paper we apply ideas from the field of density ratio estimation to bypass the difficult step of learning likelihood functions in HMMs. By reformulating inference and model fitting in terms of density ratios and applying a fast kernel-based estimation method, we show that it is possible to obtain a striking increase in discriminative performance while retaining the probabilistic qualities of the HMM. We demonstrate experimentally that this formulation makes more efficient use of training data than alternative approaches. 1 Introduction Inference of a sequence of estimated classes from a sequence of noisy observations is fundamental in many applications. The hidden Markov model (HMM) and its variants are the usual methods employed to do this, and have been used with conspicuous success in such domains as speech recognition, bioinformatics and natural language processing. As well as being computationally efficient, they are a popular choice due to their intuitive probabilistic interpretation.

artificial intelligence, estimation, machine learning, (14 more...)

arXiv.org Machine Learning

1302.37

Country: Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.44)

Genre: Research Report (0.65)

Industry: Health & Medicine > Therapeutic Area (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback