AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Stochastic Gradient Hamiltonian Monte Carlo

Chen, Tianqi, Fox, Emily B., Guestrin, Carlos

arXiv.org Machine LearningMay-12-2014

Hamiltonian Monte Carlo (HMC) sampling methods provide a mechanism for defining distant proposals with high acceptance probabilities in a Metropolis-Hastings framework, enabling more efficient exploration of the state space than standard random-walk proposals. The popularity of such methods has grown significantly in recent years. However, a limitation of HMC methods is the required gradient computation for simulation of the Hamiltonian dynamical system-such computation is infeasible in problems involving a large sample size or streaming data. Instead, we must rely on a noisy gradient estimate computed from a subset of the data. In this paper, we explore the properties of such a stochastic gradient HMC approach. Surprisingly, the natural implementation of the stochastic approximation can be arbitrarily bad. To address this problem we introduce a variant that uses second-order Langevin dynamics with a friction term that counteracts the effects of the noisy gradient, maintaining the desired target distribution as the invariant distribution. Results on simulated data validate our theory. We also provide an application of our methods to a classification task using neural networks and to online Bayesian matrix factorization.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1402.4102

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.75)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Learning modular structures from network data and node variables

Azizi, Elham, Galagan, James E., Airoldi, Edoardo M.

arXiv.org Machine LearningMay-11-2014

A standard technique for understanding underlying dependency structures among a set of variables posits a shared conditional probability distribution for the variables measured on individuals within a group. This approach is often referred to as module networks, where individuals are represented by nodes in a network, groups are termed modules, and the focus is on estimating the network structure among modules. However, estimation solely from node-specific variables can lead to spurious dependencies, and unverifiable structural assumptions are often used for regularization. Here, we propose an extended model that leverages direct observations about the network in addition to node-specific variables. By integrating complementary data types, we avoid the need for structural assumptions. We illustrate theoretical and practical significance of the model and develop a reversible-jump MCMC learning procedure for learning modules and model parameters. We demonstrate the method accuracy in predicting modular structures from synthetic data and capability to learn influence structures in twitter data and regulatory modules in the Mycobacterium tuberculosis gene regulatory network.

artificial intelligence, machine learning, module, (21 more...)

arXiv.org Machine Learning

1405.2566

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.87)
Health & Medicine > Therapeutic Area > Immunology (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
(2 more...)

Add feedback

A Hybrid Monte Carlo Architecture for Parameter Optimization

Brofos, James

arXiv.org Machine LearningMay-9-2014

Much recent research has been conducted in the area of Bayesian learning, particularly with regard to the optimization of hyper-parameters via Gaussian process regression. The methodologies rely chiefly on the method of maximizing the expected improvement of a score function with respect to adjustments in the hyper-parameters. In this work, we present a novel algorithm that exploits notions of confidence intervals and uncertainties to enable the discovery of the best optimal within a targeted region of the parameter space. We demonstrate the efficacy of our algorithm with respect to machine learning problems and show cases where our algorithm is competitive with the method of maximizing expected improvement.

artificial intelligence, bayesian inference, machine learning, (15 more...)

arXiv.org Machine Learning

1405.2377

Country: North America > United States > California (0.14)

Genre: Research Report (0.83)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Reproducing kernel Hilbert space based estimation of systems of ordinary differential equations

González, Javier, Vujačić, Ivan, Wit, Ernst

arXiv.org Machine LearningMay-8-2014

Nonlinear systems of differential equations have attracted the interest in fields like system biology, ecology or biochemistry, due to their flexibility and their ability to describe dynamical systems. Despite the importance of such models in many branches of science they have not been the focus of systematic statistical analysis until recently. In this work we propose a general approach to estimate the parameters of systems of differential equations measured with noise. Our methodology is based on the maximization of the penalized likelihood where the system of differential equations is used as a penalty. To do so, we use a Reproducing Kernel Hilbert Space approach that allows us to formulate the estimation problem as an unconstrained numeric maximization problem easy to solve. The proposed method is tested with synthetically simulated data and it is used to estimate the unobserved transcription factor CdaR in Steptomyes coelicolor using gene expression data of the genes it regulates. Keywords: System of ordinary differential equations, differential operator, reproducing kernel Hilbert space, gene regulatory network 1. Introduction Despite the fact that differential equations are a common modelling tool within science and engineering, statistical methods for estimating such models have only received widespread attention during the last few years. The difficulty of solving differential equations in general has been a major stumbling block for efficient statistical procedures.

artificial intelligence, bayesian inference, machine learning, (16 more...)

arXiv.org Machine Learning

1311.3576

Country: North America > United States (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Factored Performance Functions with Structural Representation in Continuous Time Bayesian Networks

Sturlaugson, Liessman (Montana State University) | Sheppard, John W. (Montana State University)

AAAI ConferencesMay-7-2014

The continuous time Bayesian network (CTBN) is a probabilistic graphical model that enables reasoning about complex, interdependent, and continuous-time subsystems. The model uses nodes to denote subsystems and arcs to denote conditional dependence. This dependence manifests in how the dynamics of a subsystem change based on the current states of its parents in the network. While the original CTBN definition allows users to specify the dynamics of how the system evolves, users might also want to place value expressions over the dynamics of the model in the form of performance functions. We formalize these performance functions for the CTBN and show how they can be factored in the same way as the network, allowing what we argue is a more intuitive and explicit representation. For cases in which a performance function must involve multiple nodes, we show how to augment the structure of the CTBN to account for the performance interaction while maintaining the factorization of a single performance function for each node.

continuous time bayesian network, factored performance function, structural representation

AAAI Conferences

The Twenty-Seventh International Flairs Conference

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

An Empirical Evaluation of Costs and Benefits of Simplifying Bayesian Networks by Removing Weak Arcs

Ratnapinda, Parot (University of Pittsburgh) | Druzdzel, Marek J. (University of Pittsburgh and Białystok University of Technology Białystok)

AAAI ConferencesMay-7-2014

We report the results of an empirical evaluation of structural simplification of Bayesian networks by removing weak arcs. We conduct a series of experiments on six networks built from real data sets selected from the UC Irvine Machine Learning Repository. We systematically remove arcs from the weakest to the strongest, relying on four measures of arc strength, and measure the classification accuracy of the resulting simplified models. Our results show that removing up to roughly 20 percent of the weakest arcs in a network has minimal effect on its classification accuracy. At the same time, structural simplification of networks leads to significant reduction of both the amount of memory taken by the clique tree and the amount of computation needed to perform inference.

empirical evaluation, removing weak arc, simplifying bayesian network, (1 more...)

AAAI Conferences

The Twenty-Seventh International Flairs Conference

Genre: Research Report > New Finding (0.53)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.60)

Add feedback

Why (and When and How) Contrastive Divergence Works

Fellows, Ian E

arXiv.org Machine LearningMay-3-2014

Contrastive divergence (CD) is a promising method of inference in high dimensional distributions with intractable normalizing constants, however, the theoretical foundations justifying its use are somewhat shaky. This document proposes a framework for understanding CD inference, how/when it works, and provides multiple justifications for the CD moment conditions, including framing them as a variational approximation. Algorithms for performing inference are discussed and are applied to social network data using an exponential-family random graph models (ERGM). The framework also provides guidance about how to construct MCMC kernels providing good CD inference, which turn out to be quite different from those used typically to provide fast global mixing.

artificial intelligence, machine learning, objective function, (18 more...)

arXiv.org Machine Learning

1405.0602

Genre: Research Report (0.84)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.71)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.48)

Add feedback

Cover Tree Bayesian Reinforcement Learning

Tziortziotis, Nikolaos, Dimitrakakis, Christos, Blekas, Konstantinos

arXiv.org Machine LearningMay-2-2014

This paper proposes an online tree-based Bayesian approach for reinforcement learning. For inference, we employ a generalised context tree model. This defines a distribution on multivariate Gaussian piecewise-linear models, which can be updated in closed form. The tree structure itself is constructed using the cover tree method, which remains efficient in high dimensional spaces. We combine the model with Thompson sampling and approximate dynamic programming to obtain effective exploration policies in unknown environments. The flexibility and computational simplicity of the model render it suitable for many reinforcement learning problems in continuous state spaces. We demonstrate this in an experimental comparison with a Gaussian process model, a linear model and simple least squares policy iteration.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Machine Learning

1305.1809

Country:

Europe (0.93)
North America > United States (0.68)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Nested Hierarchical Dirichlet Processes

Paisley, John, Wang, Chong, Blei, David M., Jordan, Michael I.

arXiv.org Machine LearningMay-2-2014

We develop a nested hierarchical Dirichlet process (nHDP) for hierarchical topic modeling. The nHDP is a generalization of the nested Chinese restaurant process (nCRP) that allows each word to follow its own path to a topic node according to a document-specific distribution on a shared tree. This alleviates the rigid, single-path formulation of the nCRP, allowing a document to more easily express thematic borrowings as a random effect. We derive a stochastic variational inference algorithm for the model, in addition to a greedy subtree selection method for each document, which allows for efficient inference using massive collections of text documents. We demonstrate our algorithm on 1.8 million documents from The New York Times and 3.3 million documents from Wikipedia.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

doi: 10.1109/TPAMI.2014.2318728

1210.6738

Country:

Europe (1.00)
Asia > Middle East (1.00)
North America > United States > California (0.28)

Genre: Research Report (0.50)

Industry:

Leisure & Entertainment > Sports (1.00)
Law (1.00)
Health & Medicine (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Communications (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Exchangeable Variable Models

Niepert, Mathias, Domingos, Pedro

arXiv.org Artificial IntelligenceMay-2-2014

A sequence of random variables is exchangeable if its joint distribution is invariant under variable permutations. We introduce exchangeable variable models (EVMs) as a novel class of probabilistic models whose basic building blocks are partially exchangeable sequences, a generalization of exchangeable sequences. We prove that a family of tractable EVMs is optimal under zero-one loss for a large class of functions, including parity and threshold functions, and strictly subsumes existing tractable independence-based model families. Extensive experiments show that EVMs outperform state of the art classifiers such as SVMs and probabilistic models which are solely based on independence assumptions.

artificial intelligence, machine learning, sequence, (18 more...)

arXiv.org Artificial Intelligence

1405.0501

Country: North America > United States (0.93)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.93)

Add feedback