Bayesian Inference
Predictive Coresets
We propose a construction of coresets based on a predictive view of Bayesian posterior inference (Fong et al., 2024; Fortini and Petrone, 2012). The main attraction of the approach is the model-agnostic nature - the method is valid with any inference model and independent of the specific inference goals, making it highly adaptable for a wide range of applications. Such adaptability is particularly valuable in the context of large-scale datasets, now commonplace in fields like genomics and astronomy. While this explosion of data offers incredible opportunities for discoveries, it also brings significant computational challenges. Tasks that were once straightforward, such as evaluating likelihoods several times have become increasingly difficult, making traditional data processing methods impractical. These obstacles have frequently pushed practitioners toward simpler statistical models that might not capture the full complexity of the data, disregarding expressiveness and flexibility that rich hierarchical and nonparametric models can offer.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
SUMMARY: This paper studies the effect of noise correlation in some models of multi-output regression. It argues that a method that does not benefit from the correlation, such as Ordinary Least Squares (OLS), may perform much worse than a method that does, such as Maximum Likelihood Estimation (MLE). For certain linear models (Pooled model and Seemingly Unrelated Regression), which are studied in the paper, the MLE estimator requires the joint optimization of the covariance and regression weights. This is a non-convex problem. Alternative Minimization (AltMin) algorithm is an approach to solve the problem by iteratively optimizing the covariance and the weights.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The paper proposes a mechanism for explaining Bayesian inference and network plasticity in the brain using an algorithm very similar to Stochastic Gradient Langevin Dynamics. Clarity: The paper is well written. Even though my background is machine learning and not neuroscience, I was able to follow most of the paper. Originality: The mechanism itself is well studied in the machine learning literature where it is called Stochastic Gradient Langevin Dynamics (SGLD) (see Ref[1] and analysis in Ref[2]). This is also well known in physics where it is usually called the Langevin equation with noisy force (see e.g.
Review for NeurIPS paper: A Limitation of the PAC-Bayes Framework
Weaknesses: The paper is technically heavy for my expertise, so I can only raise questions about its content. Might they be naive, discussing them in the paper would help other readers to understand the scope of this work. A first concern is about the fact that the paper presents solely (Theorem 1) the PAC-Bayes bound of McAllester (1999), converging at rate sqrt(1/m). Since this pioneer work, many variations on the PAC-Bayes bounds have been proposed. Notably, Seeger (2002)'s and Catoni (2007)'s bounds are known to converge at rate 1/m when the empirical risk is zero (see also Guedj (2019) for a up-to-date overview of PAC-Bayes literature).
Review for NeurIPS paper: Instance Based Approximations to Profile Maximum Likelihood
Summary and Contributions: Statistical property estimation is an important and active area at the intersection of theoretical computer science, statistics, and information theory. For example, a basic question in this realm: given n iid samples from an unknown discrete distribution p, how well can we estimate the entropy H(p), and what is an efficient algorithm for doing so? Recent efforts have shown that, for any symmetric property, the profile maximum likelihood estimator is universally minimax optimal for a wide range of parameters. While this at first seemed like a purely theoretical result, algorithmic efforts quickly caught up to show that 1) efficient approximation of the profile maximum likelihood estimator is possible and 2) approximate profile maximum likelihood estimation suffices for minimax optimality. In this context, this paper refines recent approximation algorithms from exp(-\sqrt{n} log n) to exp(-k log n) where k is the number of observed frequencies, with k O(\sqrt{n}).
Export Reviews, Discussions, Author Feedback and Meta-Reviews
High-dimensional neural spike train analysis with generalized count linear dynamical systems This paper describes a general exponential-family model (called the "generalized count" (GC) distribution) for multi-neuron spike count data. The model accounts for both under-dispersed and over-dispersed spike count data, and has Poisson, Negative Binomial, Bernoulli, and several other classic models as special cases. The authors give a clear account of the relationship to other models, and demonstrate the need for a model to capture under-dispersed counts in primate motor cortex. They then describe an efficient method for maximum-likelihood fitting (and demonstrate concavity of the log-likelihood). They derive an efficient variational Bayesian inference method and apply the model to data from primate motor cortex, showing that it accounts more accurately for variance and cross-covariance of spike count data, compared to a model with Poisson observations.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
This paper is about a new Bayesian method for multi label learning. The goal is to classify accurately in settings where there are many potential labels but only a few of them apply to each data point. The basis of the new results is a new generative model for the label vector of each example. Specifically the label vector y_n of the n-th example is generated as y_n f(V(\sigma(Wx_n)), where Wx_n is a lower dimensional projection of the n-th instance x_n, followed by an element-wise sigmoid activation \sigma. The final operation f corresponds to drawing Poisson random variables with rates given by V(\sigma(Wx_n)) and thresholding these so-called latent counts by taking the minimum with 1.
Export Reviews, Discussions, Author Feedback and Meta-Reviews
This paper proposed the population posterior distribution for Bayesian modeling of streams of data and showed how stochastic optimization could be used to find a good approximation. The proposed framework and algorithm were demonstrated on both latent Dirichlet allocation and Dirichlet process mixture models on text and geolocation data and were shown to perform better than previous work in some cases. Overall, I think the main idea of the paper is very interesting and it would fit in well at NIPS. There are a few aspects of the paper that could use some more discussion though. First, the authors were very careful throughout the paper to use the term "Bayesian modeling", except the title uses "Bayesian inference", which this paper definitely does not provide a method for.
Probabilistic Artificial Intelligence
Krause, Andreas, Hübotter, Jonas
Artificial intelligence commonly refers to the science and engineering of artificial systems that can carry out tasks generally associated with requiring aspects of human intelligence, such as playing games, translating languages, and driving cars. In recent years, there have been exciting advances in learning-based, data-driven approaches towards AI, and machine learning and deep learning have enabled computer systems to perceive the world in unprecedented ways. Reinforcement learning has enabled breakthroughs in complex games such as Go and challenging robotics tasks such as quadrupedal locomotion. A key aspect of intelligence is to not only make predictions, but reason about the uncertainty in these predictions, and to consider this uncertainty when making decisions. This is what this manuscript on "Probabilistic Artificial Intelligence" is about. The first part covers probabilistic approaches to machine learning. We discuss the differentiation between "epistemic" uncertainty due to lack of data and "aleatoric" uncertainty, which is irreducible and stems, e.g., from noisy observations and outcomes. We discuss concrete approaches towards probabilistic inference and modern approaches to efficient approximate inference. The second part of the manuscript is about taking uncertainty into account in sequential decision tasks. We consider active learning and Bayesian optimization -- approaches that collect data by proposing experiments that are informative for reducing the epistemic uncertainty. We then consider reinforcement learning and modern deep RL approaches that use neural network function approximation. We close by discussing modern approaches in model-based RL, which harness epistemic and aleatoric uncertainty to guide exploration, while also reasoning about safety.