AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

DOLDA - a regularized supervised topic model for high-dimensional multi-class regression

Magnusson, Måns, Jonsson, Leif, Villani, Mattias

arXiv.org Machine LearningOct-20-2016

During the last decades more and more textual data have become available, creating a growing need to statistically analyze large amounts of textual data. The hugely popular Latent Dirichlet Allocation (LDA) model introduced by Blei et al. (2003) is a generative probability model where each document is summarized by a set of latent semantic themes, often called topics; formally, a topic is a probability distribution over the vocabulary. An estimated LDA model is therefore a compressed latent representation of the documents. LDA is a mixed membership model where each document is a mixture of topics, where each word (token) in a document belongs to a single topic. The basic LDA model is unsupervised, i.e. the topics are learned solely from the words in the documents without access to document labels. In many situations there are also other information we would like to incorporate in modeling a corpus of documents. A common example is when we have labeled documents, such as ratings of movies together with a movie description, illness category in medical journals or the location of the identified bug together with bug reports. In these situation, one can use a so called supervised topic model to find the semantic structure in the documents that are related to the class of interest. One of the first approaches to supervised topic models was proposed by Mcauliffe and Blei (2008).

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1602.0026

Country: Europe > Sweden (0.29)

Genre: Research Report (0.64)

Industry: Media > Film (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Deep Amortized Inference for Probabilistic Programs

Ritchie, Daniel, Horsfall, Paul, Goodman, Noah D.

arXiv.org Machine LearningOct-18-2016

Probabilistic programming languages (PPLs) are a powerful modeling tool, able to represent any computable probability distribution. Unfortunately, probabilistic program inference is often intractable, and existing PPLs mostly rely on expensive, approximate sampling-based methods. To alleviate this problem, one could try to learn from past inferences, so that future inferences run faster. This strategy is known as amortized inference; it has recently been applied to Bayesian networks and deep generative models. This paper proposes a system for amortized inference in PPLs. In our system, amortization comes in the form of a parameterized guide program. Guide programs have similar structure to the original program, but can have richer data flow, including neural network components. These networks can be optimized so that the guide approximately samples from the posterior distribution defined by the original program. We present a flexible interface for defining guide programs and a stochastic gradient-based scheme for optimizing guide parameters, as well as some preliminary results on automatically deriving guide programs. We explore in detail the common machine learning pattern in which a 'local' model is specified by 'global' random values and used to generate independent observed data points; this gives rise to amortized local inference supporting global model learning.

gaussian, guide program, inference, (15 more...)

arXiv.org Machine Learning

1610.05735

Country:

North America > United States (0.14)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Add feedback

Modeling the Dynamics of Online Learning Activity

Mavroforakis, Charalampos, Valera, Isabel, Rodriguez, Manuel Gomez

arXiv.org Machine LearningOct-18-2016

Learning has become an online activity - people routinely use a wide variety of online learning platforms, ranging from wikis and question answering (Q&A) sites to online communities and blogs, to learn about a large range of topics. In this context, people find solutions to their problems by looking for closely related pieces of information, executing a sequence of queries or, more generally, performing a series of online actions. For example, a high school student may study several closely related wiki pages to prepare an essay about a historical event; a software developer may read several answers within a Q&A site to solve a specific programming problem; and, a researcher may check a specialized blog written by one of her peers to learn about a new concept or technique. All the above are examples of learning patterns, in which people perform a series of online actions - reading a wiki page, an answer, or a blog - to achieve a predefined goal - writing an essay, solving a programming problem, or learning about a new concept or technique. In this context, one may expect that people with similar goals undertake similar sequences of online actions and thus adopt similar learning patterns. Therefore, one could leverage the vast availability of online traces of users' learning activity to disambiguate among interleaved learning patterns adopted by individuals over time, as well as to automatically identify and track those people's interests and goals over time. In this work, we introduce a novel probabilistic model, the Hierarchical Dirichlet Hawkes Process (HDHP), for clustering continuous-time grouped streaming data, which we use to uncover the dynamics of learning activity on the web. The HDHP leverages the properties of the Hierarchical Dirichlet Process (HDP) [18], a popular Bayesian nonparametric model for clustering problems involving multiple groups of data, combined with the Hawkes process [13], a temporal point process particularly well fitted to model social activity [11, 19, 20]. In particular, the former is used to account for an infinite number of learning patterns, which are shared across users (groups) of an online learning platform.

artificial intelligence, bayesian inference, machine learning, (20 more...)

arXiv.org Machine Learning

1610.05775

Genre:

Instructional Material (0.92)
Research Report > New Finding (0.46)

Industry:

Education > Educational Setting > Online (0.81)
Education > Educational Setting > K-12 Education > Secondary School (0.54)

Technology:

Information Technology > Communications (1.00)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.81)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
(2 more...)

Add feedback

Modeling community structure and topics in dynamic text networks

Henry, Teague, Banks, David, Chai, Christine, Owens-Oas, Derek

arXiv.org Machine LearningOct-18-2016

Dynamic text networks have been widely studied in recent years, primarily because the Internet stores textual data in a way that allows links between different documents. Articles on the Wikipedia (Hoffman et al., 2010), citation networks in journal articles (Moody, 2004), and linked blog posts (Latouche et al., 2011) are examples of dynamic text networks, or networks of documents that are generated over time. But each application has idiosyncratic features, such as the structure of the links and the nature of the time varying documents, so analysis typically requires bespoke models that directly address those aspects.

data mining, machine learning, natural language, (23 more...)

arXiv.org Machine Learning

1610.05756

Country:

North America > United States (1.00)
Asia > Middle East (0.67)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Law (1.00)
Government > Voting & Elections (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
(3 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.93)
(3 more...)

Add feedback

Fast Sampling for Bayesian Max-Margin Models

Hu, Wenbo, Zhu, Jun, Zhang, Bo

arXiv.org Artificial IntelligenceOct-18-2016

Bayesian max-margin models have shown superiority in various practical applications, such as text categorization, collaborative prediction, social network link prediction and crowdsourcing, and they conjoin the flexibility of Bayesian modeling and predictive strengths of max-margin learning. However, Monte Carlo sampling for these models still remains challenging, especially for applications that involve large-scale datasets. In this paper, we present the stochastic subgradient Hamiltonian Monte Carlo (HMC) methods, which are easy to implement and computationally efficient. We show the approximate detailed balance property of subgradient HMC which reveals a natural and validated generalization of the ordinary HMC. Furthermore, we investigate the variants that use stochastic subsampling and thermostats for better scalability and mixing. Using stochastic subgradient Markov Chain Monte Carlo (MCMC), we efficiently solve the posterior inference task of various Bayesian max-margin models and extensive experimental results demonstrate the effectiveness of our approach.

artificial intelligence, classifier, machine learning, (13 more...)

arXiv.org Artificial Intelligence

1504.07107

Genre: Research Report > New Finding (0.88)

Industry:

Information Technology (0.48)
Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.89)
(2 more...)

Add feedback

Spatio-temporal Gaussian processes modeling of dynamical systems in systems biology

Niu, Mu, Dai, Zhenwen, Lawrence, Neil, Becker, Kolja

arXiv.org Machine LearningOct-17-2016

Quantitative modeling of post-transcriptional regulation process is a challenging problem in systems biology. A mechanical model of the regulatory process needs to be able to describe the available spatio-temporal protein concentration and mRNA expression data and recover the continuous spatio-temporal fields. Rigorous methods are required to identify model parameters. A promising approach to deal with these difficulties is proposed using Gaussian process as a prior distribution over the latent function of protein concentration and mRNA expression. In this study, we consider a partial differential equation mechanical model with differential operators and latent function. Since the operators at stake are linear, the information from the physical model can be encoded into the kernel function. Hybrid Monte Carlo methods are employed to carry out Bayesian inference of the partial differential equation parameters and Gaussian process kernel parameters. The spatio-temporal field of protein concentration and mRNA expression are reconstructed without explicitly solving the partial differential equation.

artificial intelligence, machine learning, model parameter, (14 more...)

arXiv.org Machine Learning

1610.05163

Country: Europe (0.28)

Genre:

Research Report > New Finding (0.34)
Research Report > Promising Solution (0.34)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Modeling & Simulation (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.35)

Add feedback

Black-box Importance Sampling

Liu, Qiang, Lee, Jason D.

arXiv.org Machine LearningOct-17-2016

Importance sampling is widely used in machine learning and statistics, but its power is limited by the restriction of using simple proposals for which the importance weights can be tractably calculated. We address this problem by studying black-box importance sampling methods that calculate importance weights for samples generated from any unknown proposal or black-box mechanism. Our method allows us to use better and richer proposals to solve difficult problems, and (somewhat counter-intuitively) also has the additional benefit of improving the estimation accuracy beyond typical importance sampling. Both theoretical and empirical analyses are provided.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1610.05247

Genre: Research Report (0.64)

Industry: Transportation > Air (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

A Bayesian Group Sparse Multi-Task Regression Model for Imaging Genetics

Greenlaw, Keelin, Szefer, Elena, Graham, Jinko, Lesperance, Mary, Nathoo, Farouk S.

arXiv.org Machine LearningOct-17-2016

Motivation: Recent advances in technology for brain imaging and high-throughput genotyping have motivated studies examining the influence of genetic variation on brain structure. Wang et al. (Bioinformatics, 2012) have developed an approach for the analysis of imaging genomic studies using penalized multi-task regression with regularization based on a novel group $l_{2,1}$-norm penalty which encourages structured sparsity at both the gene level and SNP level. While incorporating a number of useful features, the proposed method only furnishes a point estimate of the regression coefficients; techniques for conducting statistical inference are not provided. A new Bayesian method is proposed here to overcome this limitation. Results: We develop a Bayesian hierarchical modeling formulation where the posterior mode corresponds to the estimator proposed by Wang et al. (Bioinformatics, 2012), and an approach that allows for full posterior inference including the construction of interval estimates for the regression parameters. We show that the proposed hierarchical model can be expressed as a three-level Gaussian scale mixture and this representation facilitates the use of a Gibbs sampling algorithm for posterior simulation. Simulation studies demonstrate that the interval estimates obtained using our approach achieve adequate coverage probabilities that outperform those obtained from the nonparametric bootstrap. Our proposed methodology is applied to the analysis of neuroimaging and genetic data collected as part of the Alzheimer's Disease Neuroimaging Initiative (ADNI), and this analysis of the ADNI cohort demonstrates clearly the value added of incorporating interval estimation beyond only point estimation when relating SNPs to brain imaging endophenotypes.

artificial intelligence, bayesian inference, machine learning, (18 more...)

arXiv.org Machine Learning

1605.02234

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.93)

Industry:

Health & Medicine > Health Care Technology (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Therapeutic Area > Neurology > Alzheimer's Disease (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

From both sides now: the math of linear regression ·

#artificialintelligenceOct-16-2016, 15:01:54 GMT

Linear regression is the most basic and the most widely used technique in machine learning; yet for all its simplicity, studying it can unlock some of the most important concepts in statistics. If you have a basic undestanding of linear regression expressed as \hat{Y} \theta_0 \theta_1X, but don't have a background in statistics and find statements like "ridge regression is equivalent to the maximum a posteriori (MAP) estimate with a zero-mean Gaussian prior" bewildering, then this post is for you. With a superficial goal of understanding that somewhat obtuse statement, its main objective is to explore the topic, starting from the standard formulation of linear regression, moving on to the probabilistic approach (maximum likelihood formulation) and from there to Bayesian linear regression. I'll use the \theta character throughout to refer to the coefficients (weights) of a regression model, either explicitly broken out as \theta_0 and \theta_1 for intercept and slope respectively, or just \theta referring to the vector of coefficients. I'll usually use the expression \theta Tx_i for the prediction a model gives at x_i, the assumption being that a 1 has been added to the vector of values at x_i . 1 In the single predictor case, we know that the least squares fit is the line that minimizes the sum of the squared distances between observed data and predicted values, i.e. it minimizes the Residual Sum of Squares (RSS): These residuals are pretty important in how we reason about our model.

artificial intelligence, machine learning, regression, (16 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.90)

Add feedback

Asymptotic Analysis of Objectives based on Fisher Information in Active Learning

Sourati, Jamshid, Akcakaya, Murat, Leen, Todd K., Erdogmus, Deniz, Dy, Jennifer G.

arXiv.org Machine LearningOct-14-2016

Obtaining labels can be costly and time-consuming. Active learning allows a learning algorithm to intelligently query samples to be labeled for efficient learning. Fisher information ratio (FIR) has been used as an objective for selecting queries in active learning. However, little is known about the theory behind the use of FIR for active learning. There is a gap between the underlying theory and the motivation of its usage in practice. In this paper, we attempt to fill this gap and provide a rigorous framework for analyzing existing FIR-based active learning methods. In particular, we show that FIR can be asymptotically viewed as an upper bound of the expected variance of the log-likelihood ratio. Additionally, our analysis suggests a unifying framework that not only enables us to make theoretical comparisons among the existing querying methods based on FIR, but also allows us to give insight into the development of new active learning approaches based on this objective.

artificial intelligence, bayesian inference, machine learning, (17 more...)

arXiv.org Machine Learning

1605.08798

Genre:

Research Report (0.64)
Overview (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
(2 more...)

Add feedback