Constrained Sampling for Language Models Should Be Easy: An MCMC Perspective
Gonzalez, Emmanuel Anaya, Vaidya, Sairam, Park, Kanghee, Ji, Ruyi, Berg-Kirkpatrick, Taylor, D'Antoni, Loris
–arXiv.org Artificial Intelligence
Constrained decoding enables Language Models (LMs) to produce samples that provably satisfy hard constraints. However, existing constrained-decoding approaches often distort the underlying model distribution, a limitation that is especially problematic in applications like program fuzzing, where one wants to generate diverse and valid program inputs for testing purposes. We propose a new constrained sampling framework based on Markov Chain Monte Carlo (MCMC) that simultaneously satisfies three core desiderata: constraint satisfying (every sample satisfies the constraint), monotonically converging (the sampling process converges to the true conditional distribution), and efficient (high-quality samples emerge in few steps). Our method constructs a proposal distribution over valid outputs and applies a Metropolis-Hastings acceptance criterion based on the LM's likelihood, ensuring principled and efficient exploration of the constrained space. Empirically, our sampler outperforms existing methods on both synthetic benchmarks and real-world program fuzzing tasks.
arXiv.org Artificial Intelligence
Jun-9-2025
- Country:
- Asia > Singapore (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America
- Dominican Republic (0.04)
- United States
- California > San Diego County
- San Diego (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Washington > King County
- Seattle (0.04)
- California > San Diego County
- Genre:
- Research Report (1.00)