Cactus: Accelerating Auto-Regressive Decoding with Constrained Acceptance Speculative Sampling

Open in new window