Goto

Collaborating Authors

 Bayesian Learning


Reviews: Iterative Refinement of the Approximate Posterior for Directed Belief Networks

Neural Information Processing Systems

The paper is very clearly written and describes technical concepts in a very comprehensible way. The approach is sound and well motivated and the experimental comparisons with other approaches are fair, though they could have been more extensive in terms of datasets. My greatest concern is about the execution time of the proposed approach, since this is a sequential Monte Carlo method that performs multiple refinement passes for each step of the training process. The authors report convergence curves vs epochs but not vs wall clock time, which should be provided as the main motivation of the paper is to speed up training for this class of generative methods. The experimental section is good in terms of which methods it compares against, but a bit lacking in terms of datasets.


Reviews: Learning under uncertainty: a comparison between R-W and Bayesian approach

Neural Information Processing Systems

This is an interesting modeling and model comparison paper, providing insights into the processing of uncertainty during learning and decision making. The paper combines advances that could be interesting to both experimental and modeling audiences. However, its clarity should be improved and parameter estimation details explained much better for the paper to be acceptable to NIPS. More specifically: - Why should highly volatile environments have high learning rates (line 2 of page 2)? Couldn't it plausibly lead to excessive weight instability?


Reviews: Learning Bayesian networks with ancestral constraints

Neural Information Processing Systems

Given ancestral constraints, some pruning of the search tree is possible. Lemma 3 (supplementary material) is the key result here. I believe it to be true, but I don't understand the proof. The phrase "By the EC tree edge generation rules, G_k also contains edge Z - W" needs more explanation. In addition there are implied constraints ( "implied constraints" is the standard terminology, here they are called "projected constraints").


Extended Bayesian Information Criteria for Gaussian Graphical Models

Neural Information Processing Systems

Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the asymptotic consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjuction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordinary Bayesian information criterion when p and the number of non-zero parameters q both scale with n.


Bayesian nonparametric models for bipartite graphs

Neural Information Processing Systems

We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, an Indian Buffet-like generative process for network growth, and a simple and efficient Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks.


Spatial Normalized Gamma Processes

Neural Information Processing Systems

Dependent Dirichlet processes (DPs) are dependent sets of random measures, each being marginally Dirichlet process distributed. They are used in Bayesian nonparametric models when the usual exchangebility assumption does not hold. We propose a simple and general framework to construct dependent DPs by marginalizing and normalizing a single gamma process over an extended space. The result is a set of DPs, each located at a point in a space such that neighboring DPs are more dependent. We describe Markov chain Monte Carlo inference, involving the typical Gibbs sampling and three different Metropolis-Hastings proposals to speed up convergence.


Self-Correcting Bayesian Optimization through Bayesian Active Learning

Neural Information Processing Systems

Gaussian processes are the model of choice in Bayesian optimization and active learning. Yet, they are highly dependent on cleverly chosen hyperparameters to reach their full potential, and little effort is devoted to finding good hyperparameters in the literature. We demonstrate the impact of selecting good hyperparameters for GPs and present two acquisition functions that explicitly prioritize hyperparameter learning. Statistical distance-based Active Learning (SAL) considers the average disagreement between samples from the posterior, as measured by a statistical distance. SAL outperforms the state-of-the-art in Bayesian active learning on several test functions.


Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Neural Information Processing Systems

Image captioning aims to describe visual content in natural language. As'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label. For instance, when the model predicts a word expressing richer semantics than the label, it will be penalized and optimized to prefer more concise expressions, referred to as conciseness optimization. In contrast, predictions that are more concise than labels lead to richness optimization. Such conflicting optimization directions could eventually result in the model generating general descriptions.


How to Turn Your Knowledge Graph Embeddings into Generative Models

Neural Information Processing Systems

Some of the most successful knowledge graph embedding (KGE) models for link prediction โ€“ CP, RESCAL, TuckER, ComplEx โ€“ can be interpreted as energy-based models. Under this perspective they are not amenable for exact maximum-likelihood estimation (MLE), sampling and struggle to integrate logical constraints. This work re-interprets the score functions of these KGEs as circuits โ€“ constrained computational graphs allowing efficient marginalisation. Then, we design two recipes to obtain efficient generative circuit models by either restricting their activations to be non-negative or squaring their outputs. Our interpretation comes with little or no loss of performance for link prediction, while the circuits framework unlocks exact learning by MLE, efficient sampling of new triples, and guarantee that logical constraints are satisfied by design.


Learning Causal Models under Independent Changes

Neural Information Processing Systems

In many scientific applications, we observe a system in different conditions in which its components may change, rather than in isolation. In our work, we are interested in explaining the generating process of such a multi-context system using a finite mixture of causal mechanisms. Recent work shows that this causal model is identifiable from data, but is limited to settings where the sparse mechanism shift hypothesis holds and only a subset of the causal conditionals change. As this assumption is not easily verifiable in practice, we study the more general principle that mechanism shifts are independent, which we formalize using the algorithmic notion of independence. We introduce an approach for causal discovery beyond partially directed graphs using Gaussian Process models, and give conditions under which we provably identify the correct causal model.