Goto

Collaborating Authors

 Bayesian Inference


Review for NeurIPS paper: Gibbs Sampling with People

Neural Information Processing Systems

Weaknesses: Overall, I thought this was a strong paper. The main concerns I had were as follows: (1) Mode-seeking versus showing the distribution: The aggregated results in the first experiment seem to show much more homogeneity than the results for GSP or MCMCP. It seems like one limitation of this approach might be that there is limited exploration of the space, perhaps making it hard to move between modes, and also makes it more difficult to see the full shape of the distribution, which I have often taken to be a goal in work using MCMCP. The movement between optimization and seeking a distribution is discussed to some extent in the paper, but I would be interested in seeing this discussed more (and perhaps whether GP without aggregation is likely to lead to more optimization than MCMCP). In the author response, they have shown additional information suggesting that GSP is more mode-seeking but also does a better job of capturing the distribution.


Review for NeurIPS paper: Gibbs Sampling with People

Neural Information Processing Systems

This paper introduces a new method for eliciting human representations of perceptual concepts, such as what RGB values people think correspond to the color "sunset" or what auditory dimensions (e.g. Rather than eliciting representations via guess-and-check (i.e., start with a dataset and then apply human-generated labels), this method (Gibbs Sampling with People, or GSP) enables inference to go in the other direction (i.e., start with labels, and then identify percepts that match those labels). GSP extends prior work (MCMC with People) to allow eliciting representations of much higher-dimensional stimuli. The reviewers unanimously praised this paper for tackling an important and relevant problem in cognitive science, for its breadth of empirical results, and for its novelty over prior work. R2 stated that the paper is "impressive in scale, scope, and results", R3 stated that it was "very relevant to the NeurIPS community and very novel", and R4 felt there could be "a potentially large impact of this work" with "substantial interest" amongst the NeurIPS community.


Scalable Quasi-Bayesian Inference for Instrumental Variable Regression

Neural Information Processing Systems

Recent years have witnessed an upsurge of interest in employing flexible machine learning models for instrumental variable (IV) regression, but the development of uncertainty quantification methodology is still lacking. In this work we present a scalable quasi-Bayesian procedure for IV regression, building upon the recently developed kernelized IV models. Contrary to Bayesian modeling for IV, our approach does not require additional assumptions on the data generating process, and leads to a scalable approximate inference algorithm with time cost comparable to the corresponding point estimation methods. Our algorithm can be further extended to work with neural network models. We analyze the theoretical properties of the proposed quasi-posterior, and demonstrate through empirical evaluation the competitive performance of our method.


Review for NeurIPS paper: Hamiltonian Monte Carlo using an adjoint-differentiated Laplace approximation: Bayesian inference for latent Gaussian models and beyond

Neural Information Processing Systems

Weaknesses: My main questions regarding the paper: 1) When computing the Laplace approximation, this still requires calculation of the Hessian, which I believe is with respect to the latent (theta). This is referred to as W in Algorithm 1. Would it be possible to comment further on the kind of trade-off between implementing full-HMC, versus the overhead of calculating the Hessian. I think this is the issue you are referring to in the second paragraph of the discussion section, whereby you mention higher-order automatic differentiation. I assume you stick to analytical Hessians (e.g. For example "Semi-Separable Hamiltonian Monte Carlo for Inference in Bayesian Hierarchical Models" by Zhang and Sutton jointly sample over hyperparameters and parameters to overcome similar funnel-like behaviours to that of the Gaussian latent variable models that you explore.


Reviews: Dichotomize and Generalize: PAC-Bayesian Binary Activated Deep Neural Networks

Neural Information Processing Systems

One contribution is a new approach for training neural networks with binary activations. The second contribution is PAC-Bayesian generalization bounds for binary activated neural networks that, when used as the training objective, come very close to test accuracy (i.e. The gap between the training and test performance is also much smaller. I think this is very promising for training more robust networks. The method actually recovers variational Bayesian learning when the coefficient C is fixed, but in contrast to it, this coefficient is learned in a principled way.


A New Approach for Knowledge Generation Using Active Inference

arXiv.org Artificial Intelligence

There are various models proposed on how knowledge is generated in the human brain including the semantic networks model. Although this model has been widely studied and even computational models are presented, but, due to various limits and inefficiencies in the generation of different types of knowledge, its application is limited to semantic knowledge because of has been formed according to semantic memory and declarative knowledge and has many limits in explaining various procedural and conditional knowledge. Given the importance of providing an appropriate model for knowledge generation, especially in the areas of improving human cognitive functions or building intelligent machines, improving existing models in knowledge generation or providing more comprehensive models is of great importance. In the current study, based on the free energy principle of the brain, is the researchers proposed a model for generating three types of declarative, procedural, and conditional knowledge. While explaining different types of knowledge, this model is capable to compute and generate concepts from stimuli based on probabilistic mathematics and the action-perception process (active inference). The proposed model is unsupervised learning that can update itself using a combination of different stimuli as a generative model can generate new concepts of unsupervised received stimuli. In this model, the active inference process is used in the generation of procedural and conditional knowledge and the perception process is used to generate declarative knowledge.


Reviews: A Polynomial Time Algorithm for Log-Concave Maximum Likelihood via Locally Exponential Families

Neural Information Processing Systems

Post-rebuttal: The authors have promised to incorporate an exposition of the sampler in the revised paper, I believe that will make the paper a more self-contained read. I maintain my rating of strong accept (8). I think this paper makes very nice contributions to the fundamental question of estimating the MLE distribution given a bunch of observations. I think the key contributions can be broken up into two key parts: - A bunch of simple but elegant structural results for the MLE distribution in terms of'tent distributions' -- distributions such that its log-density is piecewise linear, and is supported over subdivisions of the convex hull of the datapoints. This allows them to write a convex program for optimizing over tent distributions.


Reviews: A Polynomial Time Algorithm for Log-Concave Maximum Likelihood via Locally Exponential Families

Neural Information Processing Systems

The submission provides a polynomial-time approximation algorithm for finding the maximum-likelihood log-concave density for a given set of data points in R d, for arbitrary d. The work is theoretical in nature, with proofs and no experiments. The problem is very interesting, since log-concave distributions include may of the commonly used parametric families (such as Gaussian), and the log-concave MLE has also other interesting properties. Previously the sample-complexity of learning a log-concave distribution has been studied, but a polynomial-time algorithm has been lacking. The present work provides such an algorithm.


Review for NeurIPS paper: Distributionally Robust Parametric Maximum Likelihood Estimation

Neural Information Processing Systems

Since everything is parametric, I'd expect explicit rates of convergence involvind all probalem complexity parameters (n, m, p, etc.) To make the rest of my points clear, let me recall the following notations are used in the paper: - n: the dimensionality of the covariate (i.e feature vector) X. Thus X is random vector in R n. BTW, in the context of ML or stats, I'd use another notation here, as n conventionally stands for "sample size".


Review for NeurIPS paper: Distributionally Robust Parametric Maximum Likelihood Estimation

Neural Information Processing Systems

This paper proposes a method for distributionally robust optimization under KL ambiguity sets for exponential families. Although KL ambiguity sets have their drawbacks, in particular not covering any changes in the inputs x, the present work produces a standard conic problem for a wide problem class via a novel analysis, provides good theoretical analysis, and yields good numerical results for a variety of small-scale classification problems. With the various clarifications that came up in the reviews, this paper makes a solid contribution to the DRO literature and will be quite welcome to the NeurIPS audience.