AITopics

Within a supervised classification framework, labeled data are used to learn classifier parameters. Prior to that, it is generally required to perform dimensionality reduction via feature extraction. These preprocessing steps have motivated numerous research works aiming at recovering latent variables in an unsupervised context. This paper proposes a unified framework to perform classification and low-level modeling jointly. The main objective is to use the estimated latent variables as features for classification and to incorporate simultaneously supervised information to help latent variable extraction. The proposed hierarchical Bayesian model is divided into three stages: a first low-level modeling stage to estimate latent variables, a second stage clustering these features into statistically homogeneous groups and a last classification stage exploiting the (possibly badly) labeled data. Performance of the model is assessed in the specific context of hyperspectral image interpretation, unifying two standard analysis techniques, namely unmixing and classification. Keywords: Bayesian model, supervised learning, image interpretation, Markov Random Field 1. Introduction In the context of image interpretation, numerous methods have been developed to extract meaningful information.

artificial intelligence, classification, machine learning, (20 more...)

1712.00368

Country: North America > United States > New York (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Brouwer, Thomas, Lio', Pietro

Prior and Likelihood Choices for Bayesian Matrix Factorisation on Small Datasets

In this paper, we study the effects of different prior and likelihood choices for Bayesian matrix factorisation, focusing on small datasets. These choices can greatly influence the predictive performance of the methods. We identify four groups of approaches: Gaussian-likelihood with real-valued priors, nonnegative priors, semi-nonnegative models, and finally Poisson-likelihood approaches. For each group we review several models from the literature, considering sixteen in total, and discuss the relations between different priors and matrix norms. We extensively compare these methods on eight real-world datasets across three application areas, giving both inter- and intra-group comparisons. We measure convergence runtime speed, cross-validation performance, sparse and noisy prediction performance, and model selection robustness. We offer several insights into the trade-offs between prior and likelihood choices for Bayesian matrix factorisation on small datasets - such as that Poisson models give poor predictions, and that nonnegative models are more constrained than real-valued ones.

artificial intelligence, bayesian inference, machine learning, (12 more...)

1712.00288

Country:

North America > United States (0.28)
Europe > United Kingdom (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area > Oncology (0.68)

Technology:

Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.47)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.47)

Kargas, Nikos, Sidiropoulos, Nicholas D., Fu, Xiao

Tensors, Learning, and 'Kolmogorov Extension' for Finite-alphabet Random Vectors

Estimating the joint probability mass function (PMF) of a set of random variables lies at the heart of statistical learning and signal processing. Without structural assumptions, such as modeling the variables as a Markov chain, tree, or other graphical model, joint PMF estimation is often considered mission impossible - the number of unknowns grows exponentially with the number of variables. But who gives us the structural model? Is there a generic, 'non-parametric' way to control joint PMF complexity without relying on a priori structural assumptions regarding the underlying probability model? Is it possible to discover the operational structure without biasing the analysis up front? What if we only observe random subsets of the variables, can we still reliably estimate the joint PMF of all? This paper shows, perhaps surprisingly, that if the joint PMF of any three variables can be estimated, then the joint PMF of all the variables can be provably recovered under relatively mild conditions. The result is reminiscent of Kolmogorov's extension theorem - consistent specification of lower-order distributions induces a unique probability measure for the entire process. The difference is that for processes of limited complexity (rank of the high-order PMF) it is possible to obtain complete characterization from only third-order distributions. In fact not all third order PMFs are needed; and under more stringent conditions even second-order will do. Exploiting multilinear (tensor) algebra, this paper proves that such higher-order PMF completion can be guaranteed - several pertinent identifiability results are derived. It also provides a practical and efficient algorithm to carry out the recovery task. Judiciously designed simulations and real-data experiments on movie recommendation and data classification are presented to showcase the effectiveness of the approach.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1712.00205

Country:

North America > United States > Virginia (0.28)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report (0.82)

Industry:

Health & Medicine (0.46)
Media > Film (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Andersen, Michael Riis, Vehtari, Aki, Winther, Ole, Hansen, Lars Kai

Bayesian inference for spatio-temporal spike-and-slab priors

In this work, we address the problem of solving a series of underdetermined linear inverse problemblems subject to a sparsity constraint. We generalize the spike-and-slab prior distribution to encode a priori correlation of the support of the solution in both space and time by imposing a transformed Gaussian process on the spike-and-slab probabilities. An expectation propagation (EP) algorithm for posterior inference under the proposed model is derived. For large scale problems, the standard EP algorithm can be prohibitively slow. We therefore introduce three different approximation schemes to reduce the computational complexity. Finally, we demonstrate the proposed model using numerical experiments based on both synthetic and real data sets.

approximation, artificial intelligence, machine learning, (16 more...)

1509.04752

Country: North America > United States > New York (0.28)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Health Care Technology (0.67)
Health & Medicine > Therapeutic Area > Neurology (0.45)
Health & Medicine > Diagnostic Medicine > Imaging (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Tegho, Christopher, Budzianowski, Paweł, Gašić, Milica

Uncertainty Estimates for Efficient Neural Network-based Dialogue Policy Optimisation

arXiv.org Machine LearningNov-30-2017

In statistical dialogue management, the dialogue manager learns a policy that maps a belief state to an action for the system to perform. Efficient exploration is key to successful policy optimisation. Current deep reinforcement learning methods are very promising but rely on epsilon-greedy exploration, thus subjecting the user to a random choice of action during learning. Alternative approaches such as Gaussian Process SARSA (GPSARSA) estimate uncertainties and are sample efficient, leading to better user experience, but on the expense of a greater computational complexity. This paper examines approaches to extract uncertainty estimates from deep Q-networks (DQN) in the context of dialogue management. We perform an extensive benchmark of deep Bayesian methods to extract uncertainty estimates, namely Bayes-By-Backprop, dropout, its concrete variation, bootstrapped ensemble and alpha-divergences, combining it with DQN algorithm.

machine learning, natural language, reinforcement learning, (14 more...)

1711.11486

Country: Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.15)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
(2 more...)

Terenin, Alexander, Xing, Eric P.

Techniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods

arXiv.org Machine LearningNov-30-2017

Markov Chain Monte Carlo (MCMC) methods such as Gibbs sampling are finding widespread use in applied statistics and machine learning. These often lead to difficult computational problems, which are increasingly being solved on parallel and distributed systems such as compute clusters. Recent work has proposed running iterative algorithms such as gradient descent and MCMC in parallel asynchronously for increased performance, with good empirical results in certain problems. Unfortunately, for MCMC this parallelization technique requires new convergence theory, as it has been explicitly demonstrated to lead to divergence on some examples. Recent theory on Asynchronous Gibbs sampling describes why these algorithms can fail, and provides a way to alter them to make them converge. In this article, we describe how to apply this theory in a generic setting, to understand the asynchronous behavior of any MCMC algorithm, including those implemented using parameter servers, and those not based on Gibbs sampling.

artificial intelligence, bayesian inference, machine learning, (15 more...)

1711.06719

Country: North America > United States (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Riemannian Stein Variational Gradient Descent for Bayesian Inference

Liu, Chang, Zhu, Jun

We develop Riemannian Stein Variational Gradient Descent (RSVGD), a Bayesian inference method that generalizes Stein Variational Gradient Descent (SVGD) to Riemann manifold. The benefits are two-folds: (i) for inference tasks in Euclidean spaces, RSVGD has the advantage over SVGD of utilizing information geometry, and (ii) for inference tasks on Riemann manifolds, RSVGD brings the unique advantages of SVGD to the Riemannian world. To appropriately transfer to Riemann manifolds, we conceive novel and non-trivial techniques for RSVGD, which are required by the intrinsically different characteristics of general Riemann manifolds from Euclidean spaces. We also discover Riemannian Stein's Identity and Riemannian Kernelized Stein Discrepancy. Experimental results show the advantages over SVGD of exploring distribution geometry and the advantages of particle-efficiency, iteration-effectiveness and approximation flexibility over other inference methods on Riemann manifolds.

artificial intelligence, machine learning, manifold, (16 more...)

1711.11216

Country:

Asia (0.28)
Europe (0.28)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.70)

Jethava, Vinay, Dubhashi, Devdatt

GANs for LIFE: Generative Adversarial Networks for Likelihood Free Inference

We introduce a framework using Generative Adversarial Networks (GANs) for likelihood--free inference (LFI) and Approximate Bayesian Computation (ABC). Our approach addresses both the key problems in likelihood--free inference, namely how to compare distributions and how to efficiently explore the parameter space. Our framework allows one to use the simulator model as a black box and leverage the power of deep networks to generate a rich set of features in a data driven fashion (as opposed to previous ad hoc approaches). Thereby it is a step towards a powerful alternative approach to LFI and ABC. On benchmark data sets, our approach improves on others with respect to scalability, ability to handle high dimensional data and complex probability distributions.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1711.11139

Country: Europe > Norway (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.46)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.34)

Chen, Yen-Chi, Wang, Y. Samuel, Erosheva, Elena A.

On the use of bootstrap with variational inference: Theory, interpretation, and a two-sample test example

Variational inference is a general approach for approximating complex density functions, such as those arising in latent variable models, popular in machine learning. It has been applied to approximate the maximum likelihood estimator and to carry out Bayesian inference, however, quantification of uncertainty with variational inference remains challenging from both theoretical and practical perspectives. This paper is concerned with developing uncertainty measures for variational inference by using bootstrap procedures. We first develop two general bootstrap approaches for assessing the uncertainty of a variational estimate and the study the underlying bootstrap theory in both fixed- and increasing-dimension settings. We then use the bootstrap approach and our theoretical results in the context of mixed membership modeling with multivariate binary data on functional disability from the National Long Term Care Survey. We carry out a two-sample approach to test for changes in the repeated measures of functional disability for the subset of individuals present in 1984 and 1994 waves.

artificial intelligence, bayesian inference, machine learning, (17 more...)

1711.11057

Country: North America > United States (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Moerland, Thomas M., Broekens, Joost, Jonker, Catholijn M.

Efficient exploration with Double Uncertain Value Networks

This paper studies directed exploration for reinforcement learning agents by tracking uncertainty about the value of each available action. We identify two sources of uncertainty that are relevant for exploration. The first originates from limited data (parametric uncertainty), while the second originates from the distribution of the returns (return uncertainty). We identify methods to learn these distributions with deep neural networks, where we estimate parametric uncertainty with Bayesian drop-out, while return uncertainty is propagated through the Bellman equation as a Gaussian distribution. Then, we identify that both can be jointly estimated in one network, which we call the Double Uncertain Value Network. The policy is directly derived from the learned distributions based on Thompson sampling. Experimental results show that both types of uncertainty may vastly improve learning in domains with a strong exploration challenge.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

1711.10789

Country: Europe (0.46)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
(2 more...)