AITopics | Bayesian Inference

Collaborating Authors

Bayesian Inference

Bayes' Theorem allows a program to infer the probabilities of likely causes from the probabilities of their effects, when what it is given are the probabilities of effects, given the causes.

News Overviews Instructional Materials AI-Alerts Classics

Zero-Truncated Poisson Tensor Factorization for Massive Binary Tensors

Hu, Changwei, Rai, Piyush, Carin, Lawrence

arXiv.org Machine LearningAug-18-2015

We present a scalable Bayesian model for low-rank factorization of massive tensors with binary observations. The proposed model has the following key properties: (1) in contrast to the models based on the logistic or probit likelihood, using a zero-truncated Poisson likelihood for binary data allows our model to scale up in the number of \emph{ones} in the tensor, which is especially appealing for massive but sparse binary tensors; (2) side-information in form of binary pairwise relationships (e.g., an adjacency network) between objects in any tensor mode can also be leveraged, which can be especially useful in "cold-start" settings; and (3) the model admits simple Bayesian inference via batch, as well as \emph{online} MCMC; the latter allows scaling up even for \emph{dense} binary data (i.e., when the number of ones in the tensor/network is also massive). In addition, non-negative factor matrices in our model provide easy interpretability, and the tensor rank can be inferred from the data. We evaluate our model on several large-scale real-world binary tensors, achieving excellent computational scalability, and also demonstrate its usefulness in leveraging side-information provided in form of mode-network(s).

artificial intelligence, machine learning, tensor, (17 more...)

arXiv.org Machine Learning

1508.0421

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine (0.69)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.66)

Add feedback

Tree-Width and the Computational Complexity of MAP Approximations in Bayesian Networks

Kwisthout, Johan

Journal of Artificial Intelligence ResearchAug-17-2015

The problem of finding the most probable explanation to a designated set of variables given partial evidence (the MAP problem) is a notoriously intractable problem in Bayesian networks, both to compute exactly and to approximate. It is known, both from theoretical considerations and from practical experience, that low tree-width is typically an essential prerequisite to efficient exact computations in Bayesian networks. In this paper we investigate whether the same holds for approximating MAP. We define four notions of approximating MAP (by value, structure, rank, and expectation) and argue that all of them are intractable in general. We prove that efficient value-approximations, structure-approximations, and rank-approximations of MAP instances with high tree-width will violate the Exponential Time Hypothesis. In contrast, we show that MAP can sometimes be efficiently expectation-approximated, even in instances with high tree-width, if the most probable explanation has a high probability. We introduce the complexity class FERT, analogous to the class FTP, to capture this notion of fixed-parameter expectation-approximability. We suggest a road-map to future research that yields fixed-parameter tractable results for expectation-approximate MAP, even in graphs with high tree-width.

artificial intelligence, bayesian inference, machine learning, (19 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4794

AI Access Foundation

10953

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Unbounded Bayesian Optimization via Regularization

Shahriari, Bobak, Bouchard-Côté, Alexandre, de Freitas, Nando

arXiv.org Machine LearningAug-14-2015

Bayesian optimization has recently emerged as a popular and efficient tool for global optimization and hyperparameter tuning. Currently, the established Bayesian optimization practice requires a user-defined bounding box which is assumed to contain the optimizer. However, when little is known about the probed objective function, it can be difficult to prescribe such bounds. In this work we modify the standard Bayesian optimization framework in a principled way to allow automatic resizing of the search space. We introduce two alternative methods and compare them on two common synthetic benchmarking test functions as well as the tasks of tuning the stochastic gradient descent optimizer of a multi-layered perceptron and a convolutional neural network on MNIST.

artificial intelligence, machine learning, optimization, (18 more...)

arXiv.org Machine Learning

1508.03666

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
(2 more...)

Add feedback

A model selection approach for clustering a multinomial sequence with non-negative factorization

Lee, Nam H., Tang, Runze, Priebe, Carey E., Rosen, Michael

arXiv.org Machine LearningAug-14-2015

We consider a problem of clustering a sequence of multinomial observations by way of a model selection criterion. We propose a form of a penalty term for the model selection procedure. Our approach subsumes both the conventional AIC and BIC criteria but also extends the conventional criteria in a way that it can be applicable also to a sequence of sparse multinomial observations, where even within a same cluster, the number of multinomial trials may be different for different observations. In addition, as a preliminary estimation step to maximum likelihood estimation, and more generally, to maximum $L_{q}$ estimation, we propose to use reduced rank projection in combination with non-negative factorization. We motivate our approach by showing that our model selection criterion and preliminary estimation step yield consistent estimates under simplifying assumptions. We also illustrate our approach through numerical experiments using real and simulated data.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1312.7559

Country: North America > United States (0.28)

Genre: Research Report (0.82)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Bayesian Dropout

Herlau, Tue, Mørup, Morten, Schmidt, Mikkel N.

arXiv.org Machine LearningAug-12-2015

Dropout has recently emerged as a powerful and simple method for training neural networks preventing co-adaptation by stochastically omitting neurons. Dropout is currently not grounded in explicit modelling assumptions which so far has precluded its adoption in Bayesian modelling. Using Bayesian entropic reasoning we show that dropout can be interpreted as optimal inference under constraints. We demonstrate this on an analytically tractable regression model providing a Bayesian interpretation of its mechanism for regularizing and preventing co-adaptation as well as its connection to other Bayesian techniques. We also discuss two general approximate techniques for applying Bayesian dropout for general models, one based on an analytical approximation and the other on stochastic variational techniques. These techniques are then applied to a Baysian logistic regression problem and are shown to improve performance as the model become more misspecified. Our framework roots dropout as a theoretically justified and practical tool for statistical modelling allowing Bayesians to tap into the benefits of dropout training.

artificial intelligence, dropout, machine learning, (14 more...)

arXiv.org Machine Learning

1508.02905

Country: Europe (0.68)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.72)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.69)

Add feedback

Alternating Minimization Algorithm with Automatic Relevance Determination for Transmission Tomography under Poisson Noise

Kaganovsky, Yan, Han, Shaobo, Degirmenci, Soysal, Politte, David G., Brady, David J., O'Sullivan, Joseph A., Carin, Lawrence

arXiv.org Machine LearningAug-11-2015

We propose a globally convergent alternating minimization (AM) algorithm for image reconstruction in transmission tomography, which extends automatic relevance determination (ARD) to Poisson noise models with Beer's law. The algorithm promotes solutions that are sparse in the pixel/voxel-differences domain by introducing additional latent variables, one for each pixel/voxel, and then learning these variables from the data using a hierarchical Bayesian model. Importantly, the proposed AM algorithm is free of any tuning parameters with image quality comparable to standard penalized likelihood methods. Our algorithm exploits optimization transfer principles which reduce the problem into parallel 1D optimization tasks (one for each pixel/voxel), making the algorithm feasible for large-scale problems. This approach considerably reduces the computational bottleneck of ARD associated with the posterior variances. Positivity constraints inherent in transmission tomography problems are also enforced. We demonstrate the performance of the proposed algorithm for x-ray computed tomography using synthetic and real-world datasets. The algorithm is shown to have much better performance than prior ARD algorithms based on approximate Gaussian noise models, even for high photon flux.

algorithm, optimization problem, upstream oil & gas, (19 more...)

arXiv.org Machine Learning

1412.8464

Country:

North America > United States (0.92)
Asia > Middle East (0.14)

Genre: Research Report (1.00)

Industry:

Energy > Oil & Gas > Upstream (0.91)
Health & Medicine > Diagnostic Medicine > Imaging (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

A Knowledge Gradient Policy for Sequencing Experiments to Identify the Structure of RNA Molecules Using a Sparse Additive Belief Model

Li, Yan, Reyes, Kristofer G., Vazquez-Anderson, Jorge, Wang, Yingfei, Contreras, Lydia M., Powell, Warren B.

arXiv.org Machine LearningAug-6-2015

We present a sparse knowledge gradient (SpKG) algorithm for adaptively selecting the targeted regions within a large RNA molecule to identify which regions are most amenable to interactions with other molecules. Experimentally, such regions can be inferred from fluorescence measurements obtained by binding a complementary probe with fluorescence markers to the targeted regions. We use a biophysical model which shows that the fluorescence ratio under the log scale has a sparse linear relationship with the coefficients describing the accessibility of each nucleotide, since not all sites are accessible (due to the folding of the molecule). The SpKG algorithm uniquely combines the Bayesian ranking and selection problem with the frequentist $\ell_1$ regularized regression approach Lasso. We use this algorithm to identify the sparsity pattern of the linear model as well as sequentially decide the best regions to test before experimental budget is exhausted. Besides, we also develop two other new algorithms: batch SpKG algorithm, which generates more suggestions sequentially to run parallel experiments; and batch SpKG with a procedure which we call length mutagenesis. It dynamically adds in new alternatives, in the form of types of probes, are created by inserting, deleting or mutating nucleotides within existing probes. In simulation, we demonstrate these algorithms on the Group I intron (a mid-size RNA molecule), showing that they efficiently learn the correct sparsity pattern, identify the most accessible region, and outperform several other policies.

artificial intelligence, bayesian inference, machine learning, (21 more...)

arXiv.org Machine Learning

1508.01551

Country: North America > United States (0.67)

Genre: Research Report (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.87)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.67)

Add feedback

Universal Approximation of Edge Density in Large Graphs

Boullé, Marc

arXiv.org Machine LearningAug-6-2015

With the recent availability of much network data, such as world wide web, social networks, phone call networks, science collaboration graphs [1], [2], there is a renewed interest for the graph partitioning problem, especially for the automatic discovery of community structures in large networks [3], [4], [5]. Beyond clustering approaches, coclustering approaches aim at summarizing the relation between two entities in a many-to-many relationship. Such a relation can be represented as a graph, where the source and target vertices represent entities and the edges stand for relations between entities. A coclustering model provides a summary of a graph by grouping source vertices and target vertices. For example, in market analysis, the source vertices of the graph represent customers, the target vertices represent products and there is one edge each time a customer has purchased a product. A coclustering model summarizes the dataset by grouping customers that have purchased approximately the same products and grouping products that have been purchased by approximately the same customers. Coclustering models have been applied to many other domains, such as information retrieval (the entities are documents and their words in a text corpus), web log analysis (cookies and their visited web pages), web structure analysis (web pages with hyperlinks between them) or telecommunication network (the call detail records stand for the edges in a call graph between a caller and a called party). All these real-world graphs are directed multigraphs, meaning that two entities may be linked by multi-edges. We aim to summarize and discover insightful patterns in such graphs, using a method with the desired following properties: 1) Robustness, to avoid detecting spurious patterns in case of noisy data.

data mining, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1508.0134

Country:

North America > United States (1.00)
Europe (0.67)

Genre: Research Report > New Finding (0.45)

Industry:

Telecommunications (0.88)
Information Technology (0.68)
Transportation > Air (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Web (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
(4 more...)

Add feedback

A MAP approach for $\ell_q$-norm regularized sparse parameter estimation using the EM algorithm

Carvajal, Rodrigo, Agüero, Juan C., Godoy, Boris I., Katselis, Dimitrios

arXiv.org Machine LearningAug-5-2015

In this paper, Bayesian parameter estimation through the consideration of the Maximum A Posteriori (MAP) criterion is revisited under the prism of the Expectation-Maximization (EM) algorithm. By incorporating a sparsity-promoting penalty term in the cost function of the estimation problem through the use of an appropriate prior distribution, we show how the EM algorithm can be used to efficiently solve the corresponding optimization problem. To this end, we rely on variance-mean Gaussian mixtures (VMGM) to describe the prior distribution, while we incorporate many nice features of these mixtures to our estimation problem. The corresponding MAP estimation problem is completely expressed in terms of the EM algorithm, which allows for handling nonlinearities and hidden variables that cannot be easily handled with traditional methods. For comparison purposes, we also develop a Coordinate Descent algorithm for the $\ell_q$-norm penalized problem and present the performance results via simulations.

artificial intelligence, bayesian inference, machine learning, (19 more...)

arXiv.org Machine Learning

1508.01071

Country:

Europe (0.46)
North America > United States > Illinois (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Unsupervised Learning in Genome Informatics

Wong, Ka-Chun, Li, Yue, Zhang, Zhaolei

arXiv.org Machine LearningAug-3-2015

With different genomes available, unsupervised learning algorithms are essential in learning genome-wide biological insights. Especially, the functional characterization of different genomes is essential for us to understand lives. In this book chapter, we review the state-of-the-art unsupervised learning algorithms for genome informatics from DNA to MicroRNA. DNA (DeoxyriboNucleic Acid) is the basic component of genomes. A significant fraction of DNA regions (transcription factor binding sites) are bound by proteins (transcription factors) to regulate gene expression at different development stages in different tissues. To fully understand genetics, it is necessary of us to apply unsupervised learning algorithms to learn and infer those DNA regions. Here we review several unsupervised learning methods for deciphering the genome-wide patterns of those DNA regions. MicroRNA (miRNA), a class of small endogenous non-coding RNA (RiboNucleic acid) species, regulate gene expression post-transcriptionally by forming imperfect base-pair with the target sites primarily at the 3$'$ untranslated regions of the messenger RNAs. Since the 1993 discovery of the first miRNA \emph{let-7} in worms, a vast amount of studies have been dedicated to functionally characterizing the functional impacts of miRNA in a network context to understand complex diseases such as cancer. Here we review several representative unsupervised learning frameworks on inferring miRNA regulatory network by exploiting the static sequence-based information pertinent to the prior knowledge of miRNA targeting and the dynamic information of miRNA activities implicated by the recently available large data compendia, which interrogate genome-wide expression profiles of miRNAs and/or mRNAs across various cell conditions.

artificial intelligence, machine learning, mirna, (18 more...)

arXiv.org Machine Learning

1508.00459

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.68)

Genre: Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
(2 more...)

Add feedback