Bayesian Learning
Quantification via Gaussian Latent Space Representations
Pรฉrez-Mon, Olaya, del Coz, Juan Josรฉ, Gonzรกlez, Pablo
Quantification, or prevalence estimation, is the task of predicting the prevalence of each class within an unknown bag of examples. Most existing quantification methods in the literature rely on prior probability shift assumptions to create a quantification model that uses the predictions of an underlying classifier to make optimal prevalence estimates. In this work, we present an end-to-end neural network that uses Gaussian distributions in latent spaces to obtain invariant representations of bags of examples. This approach addresses the quantification problem using deep learning, enabling the optimization of specific loss functions relevant to the problem and avoiding the need for an intermediate classifier, tackling the quantification problem as a direct optimization problem. Our method achieves state-of-the-art results, both against traditional quantification methods and other deep learning approaches for quantification. The code needed to reproduce all our experiments is publicly available at https://github.com/AICGijon/gmnet.
Ranking with Confidence for Large Scale Comparison Data
Valdeira, Filipa, Soares, Clรกudia
In this work, we leverage a generative data model considering comparison noise to develop a fast, precise, and informative ranking algorithm from pairwise comparisons that produces a measure of confidence on each comparison. The problem of ranking a large number of items from noisy and sparse pairwise comparison data arises in diverse applications, like ranking players in online games, document retrieval or ranking human perceptions. Although different algorithms are available, we need fast, large-scale algorithms whose accuracy degrades gracefully when the number of comparisons is too small. Fitting our proposed model entails solving a non-convex optimization problem, which we tightly approximate by a sum of quasi-convex functions and a regularization term. Resorting to an iterative reweighted minimization and the Primal-Dual Hybrid Gradient method, we obtain PD-Rank, achieving a Kendall tau 0.1 higher than all comparing methods, even for 10\% of wrong comparisons in simulated data matching our data model, and leading in accuracy if data is generated according to the Bradley-Terry model, in both cases faster by one order of magnitude, in seconds. In real data, PD-Rank requires less computational time to achieve the same Kendall tau than active learning methods.
A note on the relations between mixture models, maximum-likelihood and entropic optimal transport
Vayer, Titouan, Lasalle, Etienne
The relations between maximum-likelihood and optimal transport (OT) have already been discussed in multiple works (Rigollet and Weed, 2018; Mena et al., 2020; Diebold et al., 2024). The purpose of this brief note is to provide the key tools used to establish these connections. The primary aim is pedagogical: we will focus on the (discrete) mixtures case, adopting a "computational OT" perspective. Hopefully, readers will find this exercise insightful. Our analysis will largely rely on the approach described in Rigollet and Weed (2018), though adapted to a different formalism and applied to a slightly different problem (mixture estimation rather than Gaussian deconvolution).
Review for NeurIPS paper: DAGs with No Fears: A Closer Look at Continuous Optimization for Learning Bayesian Networks
Weaknesses: The problem in the paper is that it fails in showing the actual scope of the new results, especially in the global context of BNs learning. In fact their methods apparently can only applied to the continuous case: no mention is ever made if the same method can work with categorical variables. This is reflected to the selected set of "state-of-the-art" methods against which they compare their methods, that is a narrow subset of the whole literature on BNs learning. Saying something like "As mentioned, this paper is most closely related to the fully continuous framework of ... " is definitely not enough: a more precise and thorough description of the limitations of this work, and its position in the whole BNs learning literature, is needed. The title and the abstract should modified as well with same reasoning.
Reviews: Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees
Summary: This paper introduces Poisson auxiliary variables to facilitate minibatch sampling. The key insight is with the appropriate Poisson parameterization, the joint distribution (Eq. The authors apply this insight to discrete-state Gibbs sampling (Algorithm 2), Metropolis Hastings (Supplement), and continuous-state Gibbs sampling (Alg 3. and 5). The authors also develop spectral gap lower bounds for all proposed Gibbs sampling methods, which provides a rough guideline for choosing a tuning parameter \lambda and comparing the (asymptotic) per iteration runtime of the methods (Table 1). Finally the authors evaluate the Gibbs methods on synthetic data, showing that their proposed method performs similarly to Gibbs while outperforming alternatives.
Review for NeurIPS paper: Bidirectional Convolutional Poisson Gamma Dynamical Systems
Summary and Contributions: The paper presents a new hierarchical Bayesian model -- convolutional Poisson-Gamma Dynamical Systems (conv-PGDS) -- for generating the observed words in a document corpus. Globally, the model assumes there are K "topic filters", D_1, ... D_K, which are distributions over 3-grams from a finite size vocabulary (size V). Each "topic" (indexed by k) has an appearance probability weight v_k 0 for appearing in a document, and we define transition probability vectors \pi_k Given this global structure, the model generates each document iid. To generate a document j, we use a Gamma dynamical system (with transitions \pi) to obtain a sequence of un-normalized membership "weight embeddings", w_j1 ... w_jT, one for each sentence (indexed by t). Each weight embedding vector w_jt indicates the relative weight of topic k across all words in the sentence t.
Reviews: An Adaptive Empirical Bayesian Method for Sparse Deep Learning
This is a novel combination of existing techniques that appears well-formulated with intriguing experimental results. In particular, this work leverages the strengths stochastic gradient MCMC methods with stochastic approximation to form an adaptive empirical Bayesian approach to learning the parameters and hyperparameters of a Bayesian neural network (BNN). My best understanding is that by optimizing the hyperparameters (rather than sampling them), this new method improves upon existing approaches, speeding up inference without sacrificing quality (especially in the model compression domain). Other areas of BNN literature could be cited, but I think the authors were prudent not to distract the reader from the particular area of focus. This work demonstrates considerable theoretical analysis and is supported by intriguing experimental evidence.