AITopics | Learning Graphical Models

Collaborating Authors

Learning Graphical Models

A graphical model or probabilistic graphical model (PGM) or structured probabilistic model is a probabilistic model for which a graph expresses the conditional dependence structure between random variables. They are commonly used in probability theory, statistics—particularly Bayesian statistics—and machine learning. (Wikipedia)

News Overviews Instructional Materials AI-Alerts Classics

Bayesian Kernel and Mutual $k$-Nearest Neighbor Regression

Kim, Hyun-Chul

arXiv.org Machine LearningAug-3-2016

We propose Bayesian extensions of two nonparametric regression methods which are kernel and mutual $k$-nearest neighbor regression methods. Derived based on Gaussian process models for regression, the extensions provide distributions for target value estimates and the framework to select the hyperparameters. It is shown that both the proposed methods asymptotically converge to kernel and mutual $k$-nearest neighbor regression methods, respectively. The simulation results show that the proposed methods can select proper hyperparameters and are better than or comparable to the former methods for an artificial data set and a real world data set.

artificial intelligence, machine learning, regression, (16 more...)

arXiv.org Machine Learning

1608.0141

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.90)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)

Add feedback

Robust Non-linear Regression: A Greedy Approach Employing Kernels with Application to Image Denoising

Papageorgiou, George, Bouboulis, Pantelis, Theodoridis, Sergios

arXiv.org Machine LearningAug-3-2016

We consider the task of robust non-linear regression in the presence of both inlier noise and outliers. Assuming that the unknown non-linear function belongs to a Reproducing Kernel Hilbert Space (RKHS), our goal is to estimate the set of the associated unknown parameters. Due to the presence of outliers, common techniques such as the Kernel Ridge Regression (KRR) or the Support Vector Regression (SVR) turn out to be inadequate. Instead, we employ sparse modeling arguments to explicitly model and estimate the outliers, adopting a greedy approach. The proposed robust scheme, i.e., Kernel Greedy Algorithm for Robust Denoising (KGARD), is inspired by the classical Orthogonal Matching Pursuit (OMP) algorithm. Specifically, the proposed method alternates between a KRR task and an OMP-like selection step. Theoretical results concerning the identification of the outliers are provided. Moreover, KGARD is compared against other cutting edge methods, where its performance is evaluated via a set of experiments with various types of noise. Finally, the proposed robust estimation framework is applied to the task of image denoising, and its enhanced performance in the presence of outliers is demonstrated.

artificial intelligence, machine learning, outlier, (16 more...)

arXiv.org Machine Learning

doi: 10.1109/TSP.2017.2708029

1601.00595

Genre: Research Report > Promising Solution (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Blocking Collapsed Gibbs Sampler for Latent Dirichlet Allocation Models

Zhang, Xin, Sisson, Scott A.

arXiv.org Machine LearningAug-2-2016

The latent Dirichlet allocation (LDA) model is a widely-used latent variable model in machine learning for text analysis. Inference for this model typically involves a single-site collapsed Gibbs sampling step for latent variables associated with observations. The efficiency of the sampling is critical to the success of the model in practical large scale applications. In this article, we introduce a blocking scheme to the collapsed Gibbs sampler for the LDA model which can, with a theoretical guarantee, improve chain mixing efficiency. We develop two procedures, an O(K)-step backward simulation and an O(log K)-step nested simulation, to directly sample the latent variables within each block. We demonstrate that the blocking scheme achieves substantial improvements in chain mixing compared to the state of the art single-site collapsed Gibbs sampler. We also show that when the number of topics is over hundreds, the nested-simulation blocking scheme can achieve a significant reduction in computation time compared to the single-site sampler.

machine learning, natural language, sampler, (21 more...)

arXiv.org Machine Learning

1608.00945

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

Add feedback

Combining Random Walks and Nonparametric Bayesian Topic Model for Community Detection

Zhu, Ruimin, Jiang, Wenxin

arXiv.org Machine LearningAug-2-2016

Community detection has been an active research area for decades. Among all probabilistic models, Stochastic Block Model has been the most popular one. This paper introduces a novel probabilistic model: RW-HDP, based on random walks and Hierarchical Dirichlet Process, for community extraction. In RW-HDP, random walks conducted in a social network are treated as documents; nodes are treated as words. By using Hierarchical Dirichlet Process, a nonparametric Bayesian model, we are not only able to cluster nodes into different communities, but also determine the number of communities automatically. We use Stochastic Variational Inference for our model inference, which makes our method time efficient and can be easily extended to an online learning algorithm.

data mining, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1607.05573

Country: North America > United States > New York (0.14)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Therapeutic Area (0.47)
Education > Educational Setting (0.34)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

A Stochastic Temporal Model of Polyphonic MIDI Performance with Ornaments

Nakamura, Eita, Ono, Nobutaka, Sagayama, Shigeki, Watanabe, Kenji

arXiv.org Artificial IntelligenceAug-2-2016

We study indeterminacies in realization of ornaments and how they can be incorporated in a stochastic performance model applicable for music information processing such as score-performance matching. We point out the importance of temporal information, and propose a hidden Markov model which describes it explicitly and represents ornaments with several state types. Following a review of the indeterminacies, they are carefully incorporated into the model through its topology and parameters, and the state construction for quite general polyphonic scores is explained in detail. By analyzing piano performance data, we find significant overlaps in inter-onset-interval distributions of chordal notes, ornaments, and inter-chord events, and the data is used to determine details of the model. The model is applied for score following and offline score-performance matching, yielding highly accurate matching for performances with many ornaments and relatively frequent errors, repeats, and skips.

algorithm, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1080/09298215.2015.1078819

1404.2314

Genre: Research Report (0.63)

Industry:

Media > Music (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Multi Level Monte Carlo methods for a class of ergodic stochastic differential equations

Szpruch, Lukasz, Vollmer, Sebastian, Zygalakis, Konstantinos, Giles, Michael B.

arXiv.org Machine LearningJul-31-2016

We develop a framework that allows the use of the multi-level Monte Carlo (MLMC) methodology (Giles 2015) to calculate expectations with respect to the invariant measures of ergodic SDEs. In that context, we study the (over-damped) Langevin equations with strongly convex potential. We show that, when appropriate contracting couplings for the numerical integrators are available, one can obtain a time-uniform estimates of the MLMC variance in stark contrast to the majority of the results in the MLMC literature. As a consequence, one can approximate expectations with respect to the invariant measure in an unbiased way without the need of a Metropolis- Hastings step. In addition, a root mean square error of $\mathcal{O}(\epsilon)$ is achieved with $\mathcal{O}(\epsilon^{-2})$ complexity on par with Markov Chain Monte Carlo (MCMC) methods, which however can be computationally intensive when applied to large data sets. Finally, we present a multilevel version of the recently introduced Stochastic Gradient Langevin (SGLD) method (Welling and Teh, 2011) built for large datasets applications. We show that this is the first stochastic gradient MCMC method with complexity $\mathcal{O}(\epsilon^{-2}|\log {\epsilon}|^{3})$, which is asymptotically an order $\epsilon$ lower than the $ \mathcal{O}(\epsilon^{-3})$ complexity of all stochastic gradient MCMC methods that are currently available. Numerical experiments confirm our theoretical findings.

artificial intelligence, coupling, machine learning, (16 more...)

arXiv.org Machine Learning

1605.01384

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Time-Sensitive Bayesian Information Aggregation for Crowdsourcing Systems

Venanzi, Matteo, Guiver, John, Kohli, Pushmeet, Jennings, Nicholas R.

Journal of Artificial Intelligence ResearchJul-28-2016

Many aspects of the design of efficient crowdsourcing processes, such as defining workers bonuses, fair prices and time limits of the tasks, involve knowledge of the likely duration of the task at hand. In this work we introduce a new timesensitive Bayesian aggregation method that simultaneously estimates a tasks duration and obtains reliable aggregations of crowdsourced judgments. Our method, called BCCTime, uses latent variables to represent the uncertainty about the workers completion time, the tasks duration and the workers accuracy. To relate the quality of a judgment to the time a worker spends on a task, our model assumes that each task is completed within a latent time window within which all workers with a propensity to genuinely attempt the labelling task (i.e., no spammers) are expected to submit their judgments. In contrast, workers with a lower propensity to valid labelling, such as spammers, bots or lazy labellers, are assumed to perform tasks considerably faster or slower than the time required by normal workers. Specifically, we use efficient message-passing Bayesian inference to learn approximate posterior probabilities of (i) the confusion matrix of each worker, (ii) the propensity to valid labelling of each worker, (iii) the unbiased duration of each task and (iv) the true label of each task. Using two real- world public datasets for entity linking tasks, we show that BCCTime produces up to 11% more accurate classifications and up to 100% more informative estimates of a tasks duration compared to stateoftheart methods.

confusion matrix, dataset, judgment, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.5175

AI Access Foundation

11016

Journal of Artificial Intelligence Research

Country:

Europe > Switzerland (0.05)
Asia > India (0.04)
North America > United States > Washington > King County > Redmond (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Communications > Social Media > Crowdsourcing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback

Multiple Testing for Neuroimaging via Hidden Markov Random Field

Shu, Hai, Nan, Bin, Koeppe, Robert

arXiv.org Machine LearningJul-28-2016

Traditional voxel-level multiple testing procedures in neuroimaging, mostly $p$-value based, often ignore the spatial correlations among neighboring voxels and thus suffer from substantial loss of power. We extend the local-significance-index based procedure originally developed for the hidden Markov chain models, which aims to minimize the false nondiscovery rate subject to a constraint on the false discovery rate, to three-dimensional neuroimaging data using a hidden Markov random field model. A generalized expectation-maximization algorithm for maximizing the penalized likelihood is proposed for estimating the model parameters. Extensive simulations show that the proposed approach is more powerful than conventional false discovery rate procedures. We apply the method to the comparison between mild cognitive impairment, a disease status with increased risk of developing Alzheimer's or another dementia, and normal controls in the FDG-PET imaging study of the Alzheimer's Disease Neuroimaging Initiative.

artificial intelligence, machine learning, procedure, (16 more...)

arXiv.org Machine Learning

doi: 10.1111/biom.12329

1404.1371

Country:

North America > United States > California (0.46)
North America > United States > Michigan (0.28)

Genre: Research Report (0.64)

Industry:

Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (1.00)
Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Variational Mixture Models with Gamma or inverse-Gamma components

Llera, A., Vidaurre, D., Pruim, R. H. R., Beckmann, C. F.

arXiv.org Machine LearningJul-26-2016

Mixture models with Gamma and or inverse-Gamma distributed mixture components are useful for medical image tissue segmentation or as post-hoc models for regression coefficients obtained from linear regression within a Generalised Linear Modeling framework (GLM), used in this case to separate stochastic (Gaussian) noise from some kind of positive or negative "activation" (modeled as Gamma or inverse-Gamma distributed). To date, the most common choice in this context it is Gaussian/Gamma mixture models learned through a maximum likelihood (ML) approach; we recently extended such algorithm for mixture models with inverse-Gamma components. Here, we introduce a fully analytical Variational Bayes (VB) learning framework for both Gamma and/or inverse-Gamma components. We use synthetic and resting state fMRI data to compare the performance of the ML and VB algorithms in terms of area under the curve and computational cost. We observed that the ML Gaussian/Gamma model is very expensive specially when considering high resolution images; furthermore, these solutions are highly variable and they occasionally can overestimate the activations severely. The Bayesian Gauss-Gamma is in general the fastest algorithm but provides too dense solutions. The maximum likelihood Gaussian/inverse-Gamma is also very fast but provides in general very sparse solutions. The variational Gaussian/inverse-Gamma mixture model is the most robust and its cost is acceptable even for high resolution images. Further, the presented methodology represents an essential building block that can be directly used in more complex inference tasks, specially designed to analyse MRI-fMRI data; such models include for example analytical variational mixture models with adaptive spatial regularization or better source models for new spatial blind source separation approaches.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1607.07573

Genre: Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Health Care Technology (0.70)
Health & Medicine > Diagnostic Medicine > Imaging (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)

Add feedback

A New PAC-Bayesian Perspective on Domain Adaptation

Germain, Pascal, Habrard, Amaury, Laviolette, François, Morvant, Emilie

arXiv.org Machine LearningJul-26-2016

We study the issue of PAC-Bayesian domain adaptation: We want to learn, from a source domain, a majority vote model dedicated to a target one. Our theoretical contribution brings a new perspective by deriving an upper-bound on the target risk where the distributions' divergence-- expressed as a ratio--controls the tradeoff between a source error measure and the target voters' disagreement. Our bound suggests that one has to focus on regions where the source data is informative. From this result, we derive a PAC-Bayesian generalization bound, and specialize it to linear classifiers. Then, we infer a learning algorithm and perform experiments on real data.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1506.04573

Country:

North America (0.46)
Europe > France (0.14)

Genre:

Overview (0.67)
Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback