No Free Lunch in LLM Watermarking: Trade-offs in Watermarking Design Choices
–Neural Information Processing Systems
Advances in generative models have made it possible for AI-generated text, code, and images to mirror human-generated content in many applications. Watermarking, a technique that aims to embed information in the output of a model to verify its source, is useful for mitigating the misuse of such AI-generated content. However, we show that common design choices in LLM watermarking schemes make the resulting systems surprisingly susceptible to attack--leading to fundamental trade-offs in robustness, utility, and usability. To navigate these trade-offs, we rigorously study a set of simple yet effective attacks on common watermarking systems, and propose guidelines and defenses for LLM watermarking in practice.
Neural Information Processing Systems
Mar-27-2025, 15:56:29 GMT
- Country:
- North America > United States
- California (0.14)
- Oregon (0.14)
- Pennsylvania (0.14)
- North America > United States
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Technology: