Goto

Collaborating Authors

 Bayesian Learning


Review for NeurIPS paper: Deep Relational Topic Modeling via Graph Poisson Gamma Belief Network

Neural Information Processing Systems

The paper, the reviews, the author response and the ensuing discussion were all taken into consideration. All reviewers considered the work marginally above the acceptance threshold. Novelty was a concern for some but other reviewers appreciated it. Lacking comparisons to GCN and others, evaluation of underlying topics, and consideration of topic modeling prior work were also concerns. However, the paper was generally felt to represent good work, and use of a deep model in this context, design of the model, and convincing experiments were appreciated.


Uncertainty Quantification With Noise Injection in Neural Networks: A Bayesian Perspective

arXiv.org Machine Learning

Model uncertainty quantification involves measuring and evaluating the uncertainty linked to a model's predictions, helping assess their reliability and confidence. Noise injection is a technique used to enhance the robustness of neural networks by introducing randomness. In this paper, we establish a connection between noise injection and uncertainty quantification from a Bayesian standpoint. We theoretically demonstrate that injecting noise into the weights of a neural network is equivalent to Bayesian inference on a deep Gaussian process. Consequently, we introduce a Monte Carlo Noise Injection (MCNI) method, which involves injecting noise into the parameters during training and performing multiple forward propagations during inference to estimate the uncertainty of the prediction. Through simulation and experiments on regression and classification tasks, our method demonstrates superior performance compared to the baseline model.



Reviews: Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much

Neural Information Processing Systems

I think this paper addresses an important issue and makes valuable contributions, and thus should be published. I have a few concerns, hence my lower rating for the last question above (which I think could be addressed relatively easily, however). I think this is fundamentally *OK* and even perhaps a positive thing. However, I think a bit more discussion needs to be given to how the arguments might be made more formal. For example, in Section 2.1, I think the proof is intended to hold only in the limit of M going to infinity. Please give a stament of what should hold in what limit-- this wasn't clear to me.


Reviews: Learning Treewidth-Bounded Bayesian Networks with Thousands of Variables

Neural Information Processing Systems

The proposed method is very similar to previous work by Nie et al. -- both use k-trees to search for low-treewidth Bayesian networks, both start with a randomly chosen initial clique, and both propose using an A* method for finding the best tree. The differences are that Nie et al. score k-trees using a mutual information score and use BDeu for choosing the final consistent Bayesian network, while this paper proposes using BIC and incrementally building the Bayesian network along with the k-tree, using the BN to score the k-tree. This paper also includes the additional restriction that the complete variable (partial) order is chosen randomly, while in Nie et al. The main justification for these differences is the ability to scale to large treewidths. However, in the experiments, the previous S2 algorithm also can scale to large treewidths.


Reviews: A Bayesian method for reducing bias in neural representational similarity analysis

Neural Information Processing Systems

The paper explains well how computing RSA using estimates of regression weights can result in a biased similarity matrix. However, in many cases in neuroscience, the RSA is computed directly on the patterns of activity, and not the estimates of regression weights beta. This diminishes the relevance of this paper to the neuroscience field. The authors very briefly address this alternate way of computing RSA in lines 123-128. It is unclear how this alternative RSA computation is biased if it does not depend on a proxy for beta estimates, and needs to be addressed further.


Reviews: Confusions over Time: An Interpretable Bayesian Model to Characterize Trends in Decision Making

Neural Information Processing Systems

The authors motivate the proposed model with the setting in which items have "true" but unobserved labels/ratings and the observed labels/ratings given by evaluators are potentially incorrect. This differs from the very common problem in recommendation systems or collaborative filtering where evaluators provide their subjective ratings but there is not assumed to be any "true" rating (e.g., users of Netflix giving 1-5 star ratings to movies). This seems like a common but underexplored setting that is worthy of further study within machine learning. The authors are also right to highlight interpretability as a desired aspect of any machine learning solution that may yield post-hoc insights into common human biases and thus suggest corrective measures. This paper does a good job of motivating the proposed model and situating it within the crowdsourcing and human annotation literature.


Reviews: Near-Optimal Smoothing of Structured Conditional Probability Matrices

Neural Information Processing Systems

If my understanding is correct, Theorem 1 of the authors does not quite apply to their algorithm ADD-1/2-Smoothed Low-Rank. Instead, it applies to the non-computable algorithm where they assume that they have a minimizer of the objective function in Theorem 3. It is not clear if the alternating optimization algorithm proposed in the paper is guaranteed to converge to a minimizer of the objective in Theorem 3. If this is true, the authors should mention this before stating Theorem 1 to avoid misleading the reader. The "discounting" seems important from the Experiments section but this is not described in the main paper. If this is so important, the authors should make room for this in the main paper. The main results (Theorem 1 and 2) are not so surprising given that this is almost a parametric estimation problem with mk parameters (so the rates should be km/n).


Reviews: Rényi Divergence Variational Inference

Neural Information Processing Systems

This is a very good and technically sound paper, containing a significant amount of material. The theoretical investigation of the properties of alpha-divergence minimization is thorough, clear and detailed. The paper provides significant theoretical insight and understanding into alpha-divergence minimization and optimization-based approximate inference in general. My biggest concern about the alpha-divergence framework is whether its theoretical richness and elegance actually translates to practical methods. In other words, I'm not sure that the practical aspects of it are appealing enough to convince practitioners of variational inference to switch to alpha-divergence minimization instead.


Reviews: Reward Augmented Maximum Likelihood for Neural Structured Prediction

Neural Information Processing Systems

The paper is a superbly written account of a simple idea that appears to work very well. The approach can straightforwardly be applied to existing max-likelihood (ML) trained models in order to in principle take into account the task reward during training and is computationally much more efficient than alternative non ML based approaches. This work risks being underappreciated as proposing but a simple addition of artificial structured-label noise, but I think the specific link with structured output task reward is sufficiently original, and the paper also uncovers important theoretical insight by revealing the formal relationship between the proposed reward augmented ML and RL-based regularized expected reward objectives. So while it works surprisingly well, you haven't yet clearly demonstrated empirically that using a truly *task-reward derived* payoff distribution is beneficial. One way to convincingly demonstrate that would be if you did your envisioned BLEU importance reweighted sampling, and were able to show that it improves the BLEU test score over your current simpler edit-distance based label noise.