Inevitable Trade-off between Watermark Strength and Speculative Sampling Efficiency for Language Models
–Neural Information Processing Systems
Large language models are probabilistic models, and the process of generating content is essentially sampling from the output distribution of the language model. Existing watermarking techniques inject watermarks into the generated content without altering the output quality. On the other hand, existing acceleration techniques, specifically speculative sampling, leverage a draft model to speed up the sampling process while preserving the output distribution. However, there is no known method to simultaneously accelerate the sampling process and inject watermarks into the generated content. In this paper, we investigate this direction and find that the integration of watermarking and acceleration is non-trivial.
Neural Information Processing Systems
May-27-2025, 03:34:37 GMT
- Technology: