prefix
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.29)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > New Jersey > Essex County > Newark (0.04)
- North America > United States (0.27)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
- Europe > Austria > Vienna (0.14)
- North America > United States > Virginia (0.04)
- North America > United States > Texas (0.04)
- (3 more...)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- Europe > Netherlands > South Holland > Delft (0.04)
- Asia > Middle East > Jordan (0.04)
Power-SMC: Low-Latency Sequence-Level Power Sampling for Training-Free LLM Reasoning
Azizi, Seyedarmin, Potraghloo, Erfan Baghaei, Ahmadi, Minoo, Kundu, Souvik, Pedram, Massoud
Many recent reasoning gains in large language models can be explained as distribution sharpening: biasing generation toward high-likelihood trajectories already supported by the pretrained model, rather than modifying its weights. A natural formalization is the sequence-level power distribution $π_α(y\mid x)\propto p_θ(y\mid x)^α$ ($α>1$), which concentrates mass on whole sequences instead of adjusting token-level temperature. Prior work shows that Metropolis--Hastings (MH) sampling from this distribution recovers strong reasoning performance, but at order-of-magnitude inference slowdowns. We introduce Power-SMC, a training-free Sequential Monte Carlo scheme that targets the same objective while remaining close to standard decoding latency. Power-SMC advances a small particle set in parallel, corrects importance weights token-by-token, and resamples when necessary, all within a single GPU-friendly batched decode. We prove that temperature $τ=1/α$ is the unique prefix-only proposal minimizing incremental weight variance, interpret residual instability via prefix-conditioned Rényi entropies, and introduce an exponent-bridging schedule that improves particle stability without altering the target. On MATH500, Power-SMC matches or exceeds MH power sampling while reducing latency from $16$--$28\times$ to $1.4$--$3.3\times$ over baseline decoding.
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
- Asia > China > Hong Kong (0.04)
- Africa > Ethiopia > Addis Ababa > Addis Ababa (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (8 more...)
- Asia > China > Beijing > Beijing (0.04)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > Canada (0.04)
- Asia > India > Maharashtra > Mumbai (0.04)
- Media > Music (0.46)
- Leisure & Entertainment (0.46)