gibbs sampling
Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees
Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models. Minibatching, in which a small random subset of the graph is used at each iteration, can help make Gibbs sampling scale to large graphical models by reducing its computational cost. In this paper, we propose a new auxiliary-variable minibatched Gibbs sampling method, {\it Poisson-minibatching Gibbs}, which both produces unbiased samples and has a theoretical guarantee on its convergence rate. In comparison to previous minibatched Gibbs algorithms, Poisson-minibatching Gibbs supports fast sampling from continuous state spaces and avoids the need for a Metropolis-Hastings correction on discrete state spaces. We demonstrate the effectiveness of our method on multiple applications and in comparison with both plain Gibbs and previous minibatched methods.
Gibbs Sampling with People
A core problem in cognitive science and machine learning is to understand how humans derive semantic representations from perceptual objects, such as color from an apple, pleasantness from a musical chord, or seriousness from a face. Markov Chain Monte Carlo with People (MCMCP) is a prominent method for studying such representations, in which participants are presented with binary choice trials constructed such that the decisions follow a Markov Chain Monte Carlo acceptance rule. However, while MCMCP has strong asymptotic properties, its binary choice paradigm generates relatively little information per trial, and its local proposal function makes it slow to explore the parameter space and find the modes of the distribution. Here we therefore generalize MCMCP to a continuous-sampling paradigm, where in each iteration the participant uses a slider to continuously manipulate a single stimulus dimension to optimize a given criterion such as'pleasantness'. We formulate both methods from a utility-theory perspective, and show that the new method can be interpreted as'Gibbs Sampling with People' (GSP). Further, we introduce an aggregation parameter to the transition step, and show that this parameter can be manipulated to flexibly shift between Gibbs sampling and deterministic optimization. In an initial study, we show GSP clearly outperforming MCMCP; we then show that GSP provides novel and interpretable results in three other domains, namely musical chords, vocal emotions, and faces.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.41)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.33)
Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees
Gibbs sampling is a Markov chain Monte Carlo method that is often used for learning and inference on graphical models. Minibatching, in which a small random subset of the graph is used at each iteration, can help make Gibbs sampling scale to large graphical models by reducing its computational cost. In this paper, we propose a new auxiliary-variable minibatched Gibbs sampling method, {\it Poisson-minibatching Gibbs}, which both produces unbiased samples and has a theoretical guarantee on its convergence rate. In comparison to previous minibatched Gibbs algorithms, Poisson-minibatching Gibbs supports fast sampling from continuous state spaces and avoids the need for a Metropolis-Hastings correction on discrete state spaces. We demonstrate the effectiveness of our method on multiple applications and in comparison with both plain Gibbs and previous minibatched methods.
Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much
Gibbs sampling is a Markov Chain Monte Carlo sampling technique that iteratively samples variables from their conditional distributions. There are two common scan orders for the variables: random scan and systematic scan. Due to the benefits of locality in hardware, systematic scan is commonly used, even though most statistical guarantees are only for random scan. While it has been conjectured that the mixing times of random scan and systematic scan do not differ by more than a logarithmic factor, we show by counterexample that this is not the case, and we prove that that the mixing times do not differ by more than a polynomial factor under mild conditions. To prove these relative bounds, we introduce a method of augmenting the state space to study systematic scan using conductance.
Review for NeurIPS paper: Gibbs Sampling with People
Weaknesses: Overall, I thought this was a strong paper. The main concerns I had were as follows: (1) Mode-seeking versus showing the distribution: The aggregated results in the first experiment seem to show much more homogeneity than the results for GSP or MCMCP. It seems like one limitation of this approach might be that there is limited exploration of the space, perhaps making it hard to move between modes, and also makes it more difficult to see the full shape of the distribution, which I have often taken to be a goal in work using MCMCP. The movement between optimization and seeking a distribution is discussed to some extent in the paper, but I would be interested in seeing this discussed more (and perhaps whether GP without aggregation is likely to lead to more optimization than MCMCP). In the author response, they have shown additional information suggesting that GSP is more mode-seeking but also does a better job of capturing the distribution.
Review for NeurIPS paper: Gibbs Sampling with People
This paper introduces a new method for eliciting human representations of perceptual concepts, such as what RGB values people think correspond to the color "sunset" or what auditory dimensions (e.g. Rather than eliciting representations via guess-and-check (i.e., start with a dataset and then apply human-generated labels), this method (Gibbs Sampling with People, or GSP) enables inference to go in the other direction (i.e., start with labels, and then identify percepts that match those labels). GSP extends prior work (MCMC with People) to allow eliciting representations of much higher-dimensional stimuli. The reviewers unanimously praised this paper for tackling an important and relevant problem in cognitive science, for its breadth of empirical results, and for its novelty over prior work. R2 stated that the paper is "impressive in scale, scope, and results", R3 stated that it was "very relevant to the NeurIPS community and very novel", and R4 felt there could be "a potentially large impact of this work" with "substantial interest" amongst the NeurIPS community.
Reviews: Poisson-Minibatching for Gibbs Sampling with Convergence Rate Guarantees
Summary: This paper introduces Poisson auxiliary variables to facilitate minibatch sampling. The key insight is with the appropriate Poisson parameterization, the joint distribution (Eq. The authors apply this insight to discrete-state Gibbs sampling (Algorithm 2), Metropolis Hastings (Supplement), and continuous-state Gibbs sampling (Alg 3. and 5). The authors also develop spectral gap lower bounds for all proposed Gibbs sampling methods, which provides a rough guideline for choosing a tuning parameter \lambda and comparing the (asymptotic) per iteration runtime of the methods (Table 1). Finally the authors evaluate the Gibbs methods on synthetic data, showing that their proposed method performs similarly to Gibbs while outperforming alternatives.
Reviews: Scan Order in Gibbs Sampling: Models in Which it Matters and Bounds on How Much
I think this paper addresses an important issue and makes valuable contributions, and thus should be published. I have a few concerns, hence my lower rating for the last question above (which I think could be addressed relatively easily, however). I think this is fundamentally *OK* and even perhaps a positive thing. However, I think a bit more discussion needs to be given to how the arguments might be made more formal. For example, in Section 2.1, I think the proof is intended to hold only in the limit of M going to infinity. Please give a stament of what should hold in what limit-- this wasn't clear to me.