AITopics | Learning Graphical Models

Collaborating Authors

Learning Graphical Models

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Bayesian Model Selection of Stochastic Block Models

Yan, Xiaoran

arXiv.org Machine LearningMay-23-2016

Abstract--A central problem in analyzing networks is partitioning them into modules or communities. One of the best tools for this is the stochastic block model, which clusters vertices into blocks with statistically homogeneous pattern of links. Despite its flexibility and popularity, there has been a lack of principled statistical model selection criteria for the stochastic block model. Here we propose a Bayesian framework for choosing the number of blocks as well as comparing it to the more elaborate degree-corrected block models, ultimately leading to a universal model selection framework capable of comparing multiple modeling combinations. We will also investigate its connection to the minimum description length principle. I NTRODUCTION An important task in network analysis is community detection, or finding groups of similar vertices which can then be analyzed separately [1]. Community structures offer clues to the processes which generated the graph, on scales ranging from face-to-face social interaction [2] through social-media communications [3] to the organization of food webs [4]. However, previous work often defines a "community" as a group of vertices with high density of connections within the group and a low density of connections to the rest of the network. While this type of assortative community structure is generally the case in social networks, we are interested in a more general definition of functional community--a group of vertices that connect to the rest of the network in similar ways. A set of similar predators form a functional group in a food web, not because they eat each other, but because they feed on similar prey.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1605.07057

Country:

North America > United States (0.68)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Bayesian leave-one-out cross-validation approximations for Gaussian latent variable models

Vehtari, Aki, Mononen, Tommi, Tolvanen, Ville, Sivula, Tuomas, Winther, Ole

arXiv.org Machine LearningMay-23-2016

The future predictive performance of a Bayesian model can be estimated using Bayesian cross-validation. In this article, we consider Gaussian latent variable models where the integration over the latent values is approximated using the Laplace method or expectation propagation (EP). We study the properties of several Bayesian leave-one-out (LOO) cross-validation approximations that in most cases can be computed with a small additional cost after forming the posterior approximation given the full data. Our main objective is to assess the accuracy of the approximative LOO cross-validation estimators. That is, for each method (Laplace and EP) we compare the approximate fast computation with the exact brute force LOO computation. Secondarily, we evaluate the accuracy of the Laplace and EP approximations themselves against a ground truth established through extensive Markov chain Monte Carlo simulation. Our empirical results show that the approach based upon a Gaussian approximation to the LOO marginal distribution (the so-called cavity distribution) gives the most accurate and reliable results among the fast methods.

approximation, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

1412.7461

Country: Europe (0.67)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Cross Validation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Markov Chains Through the Lens of Dynamical Systems: The Case of Evolution

#artificialintelligenceMay-22-2016, 18:31:12 GMT

In this post, we will see the main technical ideas in the analysis of the mixing time of evolutionary Markov chains introduced in a previous post. We start by introducing the notion of the expected motion of a stochastic process or a Markov chain. In the case of a finite population evolutionary Markov chain, the expected motion turns out to be a dynamical system which corresponds to the infinite population evolutionary dynamics with the same parameters. Surprisingly, we show that the limit sets of this dynamical system govern the mixing time of the Markov chain. In particular, if the underlying dynamical system has a unique stable fixed point (as in asexual evolution), then the mixing is fast and in the case of multiple stable fixed points (as in sexual evolution), the mixing is slow.

artificial intelligence, machine learning, markov chain, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Visual Information Theory -- colah's blog

#artificialintelligenceMay-22-2016, 15:02:08 GMT

I love the feeling of having a new way to think about the world. I especially love when there's some vague idea that gets formalized into a concrete concept. Information theory is a prime example of this. Information theory gives us precise language for describing a lot of things. How uncertain am I? How much does knowing the answer to question A tell me about the answer to question B? How similar is one set of beliefs to another? I've had informal versions of these ideas since I was a young child, but information theory crystallizes them into precise, powerful ideas. These ideas have an enormous variety of applications, from the compression of data, to quantum physics, to machine learning, and vast fields in between. Unfortunately, information theory can seem kind of intimidating. I don't think there's any reason it should be. In fact, many core ideas can be explained completely visually! Before we dive into information theory, let's think about how we can visualize simple probability distributions. We'll need this later on, and it's convenient to address now. As a bonus, these tricks for visualizing probability are pretty useful in and of themselves! Sometimes it rains, but mostly there's sun! Let's say it's sunny 75% of the time. It's easy to make a picture of that: Most days, I wear a t-shirt, but some days I wear a coat. Let's say I wear a coat 38% of the time. It's also easy to make a picture for that! What if I want to visualize both at the same time?

artificial intelligence, codeword, machine learning, (16 more...)

#artificialintelligence

Country:

Oceania > Australia (0.04)
North America > United States > California (0.04)

Genre: Personal > Interview (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Machine Learning basics for a newbie

#artificialintelligenceMay-21-2016, 01:10:27 GMT

Teaching the machines involve a structural process where every stage builds a better version of the machine.

artificial intelligence, learning, machine learning, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.31)

Add feedback

Factored Temporal Sigmoid Belief Networks for Sequence Learning

Song, Jiaming, Gan, Zhe, Carin, Lawrence

arXiv.org Machine LearningMay-21-2016

Deep conditional generative models are developed to simultaneously learn the temporal dependencies of multiple sequences. The model is designed by introducing a three-way weight tensor to capture the multiplicative interactions between side information and sequences. The proposed model builds on the Temporal Sigmoid Belief Network (TSBN), a sequential stack of Sigmoid Belief Networks (SBNs). The transition matrices are further factored to reduce the number of parameters and improve generalization. When side information is not available, a general framework for semi-supervised learning based on the proposed model is constituted, allowing robust sequence classification. Experimental results show that the proposed approach achieves state-of-the-art predictive and classification performance on sequential data, and has the capacity to synthesize sequences, with controlled style transitioning and blending.

artificial intelligence, machine learning, sigmoid belief network, (16 more...)

arXiv.org Machine Learning

1605.06715

Country:

North America > United States (0.28)
Asia > China (0.28)

Genre: Research Report > New Finding (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.93)

Add feedback

Automatic Wordnet Development for Low-Resource Languages using Cross-Lingual WSD

Taghizadeh, Nasrin, Faili, Hesham

Journal of Artificial Intelligence ResearchMay-20-2016

Wordnets are an effective resource for natural language processing and information retrieval, especially for semantic processing and meaning related tasks. So far, wordnets have been constructed for many languages. However, the automatic development of wordnets for low-resource languages has not been well studied. In this paper, an Expectation-Maximization algorithm is used to create high quality and large scale wordnets for poorresource languages. The proposed method benefits from possessing cross-lingual word sense disambiguation and develops a wordnet by only using a bi-lingual dictionary and a monolingual corpus. The proposed method has been executed with Persian language and the resulting wordnet has been evaluated through several experiments. The results show that the induced wordnet has a precision score of 90% and a recall score of 35%.

synset, wordnet, wordnet synset, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4968

AI Access Foundation

11003

Journal of Artificial Intelligence Research

Country:

Asia > Middle East > Iran > Tehran Province > Tehran (0.04)
Europe > Italy > Liguria > Genoa (0.04)
Asia > South Korea (0.04)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Workflow (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Unsupervised Feature Extraction by Time-Contrastive Learning and Nonlinear ICA

Hyvarinen, Aapo, Morioka, Hiroshi

arXiv.org Machine LearningMay-20-2016

Nonlinear independent component analysis (ICA) provides an appealing framework for unsupervised feature learning, but the models proposed so far are not identifiable. Here, we first propose a new intuitive principle of unsupervised deep learning from time series which uses the nonstationary structure of the data. Our learning principle, time-contrastive learning (TCL), finds a representation which allows optimal discrimination of time segments (windows). Surprisingly, we show how TCL can be related to a nonlinear ICA model, when ICA is redefined to include temporal nonstationarities. In particular, we show that TCL combined with linear ICA estimates the nonlinear ICA model up to point-wise transformations of the sources, and this solution is unique --- thus providing the first identifiability result for nonlinear ICA which is rigorous, constructive, as well as very general.

artificial intelligence, feature extractor, machine learning, (13 more...)

arXiv.org Machine Learning

1605.06336

Genre: Research Report (0.65)

Industry:

Health & Medicine > Health Care Technology (0.46)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

ATD: Anomalous Topic Discovery in High Dimensional Discrete Data

Soleimani, Hossein, Miller, David J.

arXiv.org Machine LearningMay-20-2016

We propose an algorithm for detecting patterns exhibited by anomalous clusters in high dimensional discrete data. Unlike most anomaly detection (AD) methods, which detect individual anomalies, our proposed method detects groups (clusters) of anomalies; i.e. sets of points which collectively exhibit abnormal patterns. In many applications this can lead to better understanding of the nature of the atypical behavior and to identifying the sources of the anomalies. Moreover, we consider the case where the atypical patterns exhibit on only a small (salient) subset of the very high dimensional feature space. Individual AD techniques and techniques that detect anomalies using all the features typically fail to detect such anomalies, but our method can detect such instances collectively, discover the shared anomalous patterns exhibited by them, and identify the subsets of salient features. In this paper, we focus on detecting anomalous topics in a batch of text documents, developing our algorithm based on topic models. Results of our experiments show that our method can accurately detect anomalous topics and salient features (words) under each such topic in a synthetic data set and two real-world text corpora and achieves better performance compared to both standard group AD and individual AD techniques. All required code to reproduce our experiments is available from https://github.com/hsoleimani/ATD

data mining, machine learning, proportion, (22 more...)

arXiv.org Machine Learning

doi: 10.1109/TKDE.2016.2561288

1512.06452

Country:

Asia (0.28)
North America > United States (0.28)

Genre: Research Report > Experimental Study (0.69)

Industry: Law Enforcement & Public Safety (0.46)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(3 more...)

Add feedback

On the estimation of initial conditions in kernel-based system identification

Risuleo, Riccardo Sven, Bottegal, Giulio, Hjalmarsson, Håkan

arXiv.org Machine LearningMay-19-2016

Recent developments in system identification have brought attention to regularized kernel-based methods, where, adopting the recently introduced stable spline kernel, prior information on the unknown process is enforced. This reduces the variance of the estimates and thus makes kernel-based methods particularly attractive when few input-output data samples are available. In such cases however, the influence of the system initial conditions may have a significant impact on the output dynamics. In this paper, we specifically address this point. We propose three methods that deal with the estimation of initial conditions using different types of information. The methods consist in various mixed maximum likelihood--a posteriori estimators which estimate the initial conditions and tune the hyperparameters characterizing the stable spline kernel. To solve the related optimization problems, we resort to the expectation-maximization method, showing that the solutions can be attained by iterating among simple update steps. Numerical experiments show the advantages, in terms of accuracy in reconstructing the system impulse response, of the proposed strategies, compared to other kernel-based schemes not accounting for the effect initial conditions.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

doi: 10.1109/cdc.2015.7402361

1504.08196

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Add feedback