AITopics | Learning Graphical Models

Collaborating Authors

Learning Graphical Models

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Beta diffusion trees and hierarchical feature allocations

Heaukulani, Creighton, Knowles, David A., Ghahramani, Zoubin

arXiv.org Machine LearningApr-3-2015

We define the beta diffusion tree, a random tree structure with a set of leaves that defines a collection of overlapping subsets of objects, known as a feature allocation. A generative process for the tree structure is defined in terms of particles (representing the objects) diffusing in some continuous space, analogously to the Dirichlet diffusion tree (Neal, 2003b), which defines a tree structure over partitions (i.e., non-overlapping subsets) of the objects. Unlike in the Dirichlet diffusion tree, multiple copies of a particle may exist and diffuse along multiple branches in the beta diffusion tree, and an object may therefore belong to multiple subsets of particles. We demonstrate how to build a hierarchically-clustered factor analysis model with the beta diffusion tree and how to perform inference over the random tree structures with a Markov chain Monte Carlo algorithm. We conclude with several numerical experiments on missing data problems with data sets of gene expression microarrays, international development statistics, and intranational socioeconomic measurements.

artificial intelligence, diffusion tree, machine learning, (17 more...)

arXiv.org Machine Learning

1408.3378

Country:

South America (1.00)
Europe (1.00)
Africa (1.00)
(2 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.54)
Health & Medicine > Public Health (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.88)

Add feedback

Individual Planning in Agent Populations: Exploiting Anonymity and Frame-Action Hypergraphs

Sonu, Ekhlas, Chen, Yingke, Doshi, Prashant

arXiv.org Artificial IntelligenceApr-2-2015

Interactive partially observable Markov decision processes (I-POMDP) provide a formal framework for planning for a self-interested agent in multiagent settings. An agent operating in a multiagent environment must deliberate about the actions that other agents may take and the effect these actions have on the environment and the rewards it receives. Traditional I-POMDPs model this dependence on the actions of other agents using joint action and model spaces. Therefore, the solution complexity grows exponentially with the number of agents thereby complicating scalability. In this paper, we model and extend anonymity and context-specific independence -- problem structures often present in agent populations -- for computational gain. We empirically demonstrate the efficiency from exploiting these problem structures by solving a new multiagent problem involving more than 1,000 agents.

agent, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

1503.0722

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

A Probabilistic Theory of Deep Learning

Patel, Ankit B., Nguyen, Tan, Baraniuk, Richard G.

arXiv.org Machine LearningApr-2-2015

A grand challenge in machine learning is the development of computational algorithms that match or outperform humans in perceptual inference tasks that are complicated by nuisance variation. For instance, visual object recognition involves the unknown object position, orientation, and scale in object recognition while speech recognition involves the unknown voice pronunciation, pitch, and speed. Recently, a new breed of deep learning algorithms have emerged for high-nuisance inference tasks that routinely yield pattern recognition systems with near- or super-human capabilities. But a fundamental question remains: Why do they work? Intuitions abound, but a coherent framework for understanding, analyzing, and synthesizing deep learning architectures has remained elusive. We answer this question by developing a new probabilistic framework for deep learning based on the Deep Rendering Model: a generative probabilistic model that explicitly captures latent nuisance variation. By relaxing the generative model to a discriminative one, we can recover two of the current leading deep learning systems, deep convolutional neural networks and random decision forests, providing insights into their successes and shortcomings, as well as a principled route to their improvement.

algorithm, artificial intelligence, machine learning, (18 more...)

arXiv.org Machine Learning

1504.00641

Country: North America > United States (0.28)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Signatures of Infinity: Nonergodicity and Resource Scaling in Prediction, Complexity, and Learning

Crutchfield, James P., Marzen, Sarah

arXiv.org Machine LearningApr-1-2015

Truly complex stochastic processes--the infinitary processes [1] whose mutual information between past and future diverges--arise in many physical and biological systems [2-5], such as those in critical states. They are implicated in many natural phenomena, from the geophysics of earthquakes [6] and physiological measurements of neural avalanches [7] to semantics in natural language [8] and cascading failure in power transmission grids [9]. Their apparent infinite memory makes empirical estimation and modeling particularly challenging. The difficulty is reflected in the computational complexity of inference [10]: the resources required to predict and model them diverge in sample size, in memory for storing model parameters, and in memory required for prediction. Resource scaling, an analog of the venerable technique of finite-size scaling in statistical mechanics, suggests that for infinitary processes we look for statistical signatures that track divergences. Since resource divergences are sensitive to a process's inherent randomness and organization, one hopes that their scaling forms are uniquely revealing indicators of process complexity and can guide the selection of appropriate models. To date, though, there are few tractable constructions with which to explore possible general relationships between prediction, complexity, and learning for infinitary processes.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1504.00386

Country:

North America > United States > California (0.46)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)

Genre: Research Report (0.40)

Industry:

Health & Medicine (0.54)
Energy > Power Industry (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Bayesian Clustering of Shapes of Curves

Zhang, Zhengwu, Pati, Debdeep, Srivastava, Anuj

arXiv.org Machine LearningApr-1-2015

The general goal here is to choose groups (clusters) of objects so as to maximize homogeneity within clusters and minimize homogeneity across clusters. The clustering problem has been addressed by researchers in many disciplines. A few well-known methods are metric based e.g. K-means (MacQueen et al., 1967), hierarchical clustering (Ward, 1963), clustering based on principal components, spectral clustering (Ng et al., 2002) and so on (Jain and Dubes, 1988; Ozawa, 1985). Traditional clustering methods are complemented by methods based on a probability model where one assumes a data generating distribution (e.g., Gaussian) and infers clustering configurations that maximize certain objective function (Banfield and Raftery, 1993; Fraley and Raftery, 1998, 2002, 2006; MacCullagh and Yang, 2008). A modelbased clustering can be useful in addressing challenges posed by traditional clustering methods. This is because a probability model allows the number of clusters to be treated as a parameter in the model, and can be embedded in a Bayesian framework providing quantification of uncertainty in the number of clusters and clustering configurations.

artificial intelligence, machine learning, matrix, (16 more...)

arXiv.org Machine Learning

1504.00377

Country: North America > United States (0.46)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

The Libra Toolkit for Probabilistic Models

Lowd, Daniel, Rooshenas, Amirmohammad

arXiv.org Artificial IntelligenceApr-1-2015

The Libra Toolkit is a collection of algorithms for learning and inference with discrete probabilistic models, including Bayesian networks, Markov networks, dependency networks, and sum-product networks. Compared to other toolkits, Libra places a greater emphasis on learning the structure of tractable models in which exact inference is efficient. It also includes a variety of algorithms for learning graphical models in which inference is potentially intractable, and for performing exact and approximate inference. Libra is released under a 2-clause BSD license to encourage broad use in academia and industry.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Artificial Intelligence

1504.0011

Country: North America > United States > Oregon > Lane County > Eugene (0.15)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Computing Convex Coverage Sets for Faster Multi-objective Coordination

Roijers, Diederik Marijn, Whiteson, Shimon, Oliehoek, Frans A.

Journal of Artificial Intelligence ResearchMar-31-2015

In this article, we propose new algorithms for multi-objective coordination graphs (MO-CoGs). Key to the efficiency of these algorithms is that they compute a convex coverage set (CCS) instead of a Pareto coverage set (PCS). Not only is a CCS a sufficient solution set for a large class of problems, it also has important characteristics that facilitate more efficient solutions. We propose two main algorithms for computing a CCS in MO-CoGs. Convex multi-objective variable elimination (CMOVE) computes a CCS by performing a series of agent eliminations, which can be seen as solving a series of local multi-objective subproblems. Variable elimination linear support (VELS) iteratively identifies the single weight vector, w, that can lead to the maximal possible improvement on a partial CCS and calls variable elimination to solve a scalarized instance of the problem for w. VELS is faster than CMOVE for small and medium numbers of objectives and can compute an ε-approximate CCS in a fraction of the runtime. In addition, we propose variants of these methods that employ AND/OR tree search instead of variable elimination to achieve memory efficiency. We analyze the runtime and space complexities of these methods, prove their correctness, and compare them empirically against a naive baseline and an existing PCS method, both in terms of memory-usage and runtime. Our results show that, by focusing on the CCS, these methods achieve much better scalability in the number of agents than the current state of the art.

agent, ccs, vector, (16 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4550

AI Access Foundation

10933

Journal of Artificial Intelligence Research

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Orange County > Irvine (0.04)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.36)

Add feedback

A Theory of Feature Learning

van Rooyen, Brendan, Williamson, Robert C.

arXiv.org Machine LearningMar-31-2015

Feature Learning aims to extract relevant information contained in data sets in an automated fashion. It is driving force behind the current deep learning trend, a set of methods that have had widespread empirical success. What is lacking is a theoretical understanding of different feature learning schemes. This work provides a theoretical framework for feature learning and then characterizes when features can be learnt in an unsupervised fashion. We also provide means to judge the quality of features via rate-distortion theory and its generalizations.

artificial intelligence, information, machine learning, (17 more...)

arXiv.org Machine Learning

1504.00083

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Thompson Sampling for Learning Parameterized Markov Decision Processes

Gopalan, Aditya, Mannor, Shie

arXiv.org Machine LearningMar-31-2015

We consider reinforcement learning in parameterized Markov Decision Processes (MDPs), where the parameterization may induce correlation across transition probabilities or rewards. Consequently, observing a particular state transition might yield useful information about other, unobserved, parts of the MDP. We present a version of Thompson sampling for parameterized reinforcement learning problems, and derive a frequentist regret bound for priors over general parameter spaces. The result shows that the number of instants where suboptimal actions are chosen scales logarithmically with time, with high probability. It holds for prior distributions that put significant probability near the true model, without any additional, specific closed-form structure such as conjugate or product-form priors. The constant factor in the logarithmic scaling encodes the information complexity of learning the MDP in terms of the Kullback-Leibler geometry of the parameter space.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1406.7498

Country: Asia (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

A Parzen-based distance between probability measures as an alternative of summary statistics in Approximate Bayesian Computation

Zuluaga, Carlos D., Valencia, Edgar A., Álvarez, Mauricio A.

arXiv.org Machine LearningMar-30-2015

Approximate Bayesian Computation (ABC) are likelihood-free Monte Carlo methods. ABC methods use a comparison between simulated data, using different parameters drew from a prior distribution, and observed data. This comparison process is based on computing a distance between the summary statistics from the simulated data and the observed data. For complex models, it is usually difficult to define a methodology for choosing or constructing the summary statistics. Recently, a nonparametric ABC has been proposed, that uses a dissimilarity measure between discrete distributions based on empirical kernel embeddings as an alternative for summary statistics. The nonparametric ABC outperforms other methods including ABC, kernel ABC or synthetic likelihood ABC. However, it assumes that the probability distributions are discrete, and it is not robust when dealing with few observations. In this paper, we propose to apply kernel embeddings using an smoother density estimator or Parzen estimator for comparing the empirical data distributions, and computing the ABC posterior. Synthetic data and real data were used to test the Bayesian inference of our method. We compare our method with respect to state-of-the-art methods, and demonstrate that our method is a robust estimator of the posterior distribution in terms of the number of observations.

artificial intelligence, machine learning, summary statistics, (15 more...)

arXiv.org Machine Learning

1503.08727

Genre:

Research Report (0.70)
Workflow (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.73)

Add feedback