Speculative Decoding for Multi-Sample Inference
Li, Yiwei, Shi, Jiayi, Feng, Shaoxiong, Yuan, Peiwen, Wang, Xinglin, Zhang, Yueqi, Zhang, Ji, Tan, Chuyi, Pan, Boyuan, Hu, Yao, Li, Kan
–arXiv.org Artificial Intelligence
We propose a novel speculative decoding method tailored for multi-sample reasoning scenarios, such as self-consistency and Best-of-N sampling. Our method exploits the intrinsic consensus of parallel generation paths to synthesize high-quality draft tokens without requiring auxiliary models or external databases. By dynamically analyzing structural patterns across parallel reasoning paths through a probabilistic aggregation mechanism, it identifies consensus token sequences that align with the decoding distribution. Evaluations on mathematical reasoning benchmarks demonstrate a substantial improvement in draft acceptance rates over baselines, while reducing the latency in draft token construction. This work establishes a paradigm shift for efficient multi-sample inference, enabling seamless integration of speculative decoding with sampling-based reasoning techniques.
arXiv.org Artificial Intelligence
Mar-7-2025
- Country:
- North America
- United States
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Louisiana > Orleans Parish
- Mexico > Mexico City
- Mexico City (0.04)
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- United States
- Europe > Austria
- Vienna (0.15)
- Asia
- Singapore (0.04)
- British Indian Ocean Territory > Diego Garcia (0.04)
- Thailand > Bangkok
- Bangkok (0.05)
- China > Beijing
- Beijing (0.04)
- Africa > Rwanda
- North America
- Genre:
- Research Report (0.82)
- Technology: