Goto

Collaborating Authors

 Bayesian Inference


Reviews: Rรฉnyi Divergence Variational Inference

Neural Information Processing Systems

This is a very good and technically sound paper, containing a significant amount of material. The theoretical investigation of the properties of alpha-divergence minimization is thorough, clear and detailed. The paper provides significant theoretical insight and understanding into alpha-divergence minimization and optimization-based approximate inference in general. My biggest concern about the alpha-divergence framework is whether its theoretical richness and elegance actually translates to practical methods. In other words, I'm not sure that the practical aspects of it are appealing enough to convince practitioners of variational inference to switch to alpha-divergence minimization instead.


Reviews: Reward Augmented Maximum Likelihood for Neural Structured Prediction

Neural Information Processing Systems

The paper is a superbly written account of a simple idea that appears to work very well. The approach can straightforwardly be applied to existing max-likelihood (ML) trained models in order to in principle take into account the task reward during training and is computationally much more efficient than alternative non ML based approaches. This work risks being underappreciated as proposing but a simple addition of artificial structured-label noise, but I think the specific link with structured output task reward is sufficiently original, and the paper also uncovers important theoretical insight by revealing the formal relationship between the proposed reward augmented ML and RL-based regularized expected reward objectives. So while it works surprisingly well, you haven't yet clearly demonstrated empirically that using a truly *task-reward derived* payoff distribution is beneficial. One way to convincingly demonstrate that would be if you did your envisioned BLEU importance reweighted sampling, and were able to show that it improves the BLEU test score over your current simpler edit-distance based label noise.


Reviews: Learning under uncertainty: a comparison between R-W and Bayesian approach

Neural Information Processing Systems

This is an interesting modeling and model comparison paper, providing insights into the processing of uncertainty during learning and decision making. The paper combines advances that could be interesting to both experimental and modeling audiences. However, its clarity should be improved and parameter estimation details explained much better for the paper to be acceptable to NIPS. More specifically: - Why should highly volatile environments have high learning rates (line 2 of page 2)? Couldn't it plausibly lead to excessive weight instability?


Reviews: Finite-Dimensional BFRY Priors and Variational Bayesian Inference for Power Law Models

Neural Information Processing Systems

This paper considers finite-dimensional approximations to the stable, generalized gamma, and stable beta processes. The construction uses scaled and exponentially tilted versions of the BFRY distribution. The main advantage of this approximation, is that the random variables involved can be simulated easily and admit tractable probability density functions, which makes them amenable to the implementation of variational algorithms. The paper is well written and I find the contributions of the paper of interest and potentially useful. The main contributions of the papers are in section 3.2, where the authors show the weak convergence of the finite-dimensional approximations of the stable, generalized gamma dn stable beta processes, using Laplace functional.


Extended Bayesian Information Criteria for Gaussian Graphical Models

Neural Information Processing Systems

Gaussian graphical models with sparsity in the inverse covariance matrix are of significant interest in many modern applications. For the problem of recovering the graphical structure, information criteria provide useful optimization objectives for algorithms searching through sets of graphs or for selection of tuning parameters of other methods such as the graphical lasso, which is a likelihood penalization technique. In this paper we establish the asymptotic consistency of an extended Bayesian information criterion for Gaussian graphical models in a scenario where both the number of variables p and the sample size n grow. Compared to earlier work on the regression case, our treatment allows for growth in the number of non-zero parameters in the true model, which is necessary in order to cover connected graphs. We demonstrate the performance of this criterion on simulated data when used in conjuction with the graphical lasso, and verify that the criterion indeed performs better than either cross-validation or the ordinary Bayesian information criterion when p and the number of non-zero parameters q both scale with n.


Bayesian nonparametric models for bipartite graphs

Neural Information Processing Systems

We develop a novel Bayesian nonparametric model for random bipartite graphs. The model is based on the theory of completely random measures and is able to handle a potentially infinite number of nodes. We show that the model has appealing properties and in particular it may exhibit a power-law behavior. We derive a posterior characterization, an Indian Buffet-like generative process for network growth, and a simple and efficient Gibbs sampler for posterior simulation. Our model is shown to be well fitted to several real-world social networks.


Spatial Normalized Gamma Processes

Neural Information Processing Systems

Dependent Dirichlet processes (DPs) are dependent sets of random measures, each being marginally Dirichlet process distributed. They are used in Bayesian nonparametric models when the usual exchangebility assumption does not hold. We propose a simple and general framework to construct dependent DPs by marginalizing and normalizing a single gamma process over an extended space. The result is a set of DPs, each located at a point in a space such that neighboring DPs are more dependent. We describe Markov chain Monte Carlo inference, involving the typical Gibbs sampling and three different Metropolis-Hastings proposals to speed up convergence.


Generalized Bayesian Inference for Scientific Simulators via Amortized Cost Estimation

Neural Information Processing Systems

Simulation-based inference (SBI) enables amortized Bayesian inference for simulators with implicit likelihoods. But when we are primarily interested in the quality of predictive simulations, or when the model cannot exactly reproduce the observed data (i.e., is misspecified), targeting the Bayesian posterior may be overly restrictive. Generalized Bayesian Inference (GBI) aims to robustify inference for (misspecified) simulator models, replacing the likelihood-function with a cost function that evaluates the goodness of parameters relative to data. However, GBI methods generally require running multiple simulations to estimate the cost function at each parameter value during inference, making the approach computationally infeasible for even moderately complex simulators. Here, we propose amortized cost estimation (ACE) for GBI to address this challenge: We train a neural network to approximate the cost function, which we define as the expected distance between simulations produced by a parameter and observed data.


Learning Descriptive Image Captioning via Semipermeable Maximum Likelihood Estimation

Neural Information Processing Systems

Image captioning aims to describe visual content in natural language. As'a picture is worth a thousand words', there could be various correct descriptions for an image. However, with maximum likelihood estimation as the training objective, the captioning model is penalized whenever its prediction mismatches with the label. For instance, when the model predicts a word expressing richer semantics than the label, it will be penalized and optimized to prefer more concise expressions, referred to as conciseness optimization. In contrast, predictions that are more concise than labels lead to richness optimization. Such conflicting optimization directions could eventually result in the model generating general descriptions.


How to Turn Your Knowledge Graph Embeddings into Generative Models

Neural Information Processing Systems

Some of the most successful knowledge graph embedding (KGE) models for link prediction โ€“ CP, RESCAL, TuckER, ComplEx โ€“ can be interpreted as energy-based models. Under this perspective they are not amenable for exact maximum-likelihood estimation (MLE), sampling and struggle to integrate logical constraints. This work re-interprets the score functions of these KGEs as circuits โ€“ constrained computational graphs allowing efficient marginalisation. Then, we design two recipes to obtain efficient generative circuit models by either restricting their activations to be non-negative or squaring their outputs. Our interpretation comes with little or no loss of performance for link prediction, while the circuits framework unlocks exact learning by MLE, efficient sampling of new triples, and guarantee that logical constraints are satisfied by design.