Discovering Clues of Spoofed LM Watermarks
Gloaguen, Thibaud, Jovanović, Nikola, Staab, Robin, Vechev, Martin
–arXiv.org Artificial Intelligence
LLM watermarks stand out as a promising way to attribute ownership of LLMgenerated text. One threat to watermark credibility comes from spoofing attacks, where an unauthorized third party forges the watermark, enabling it to falsely attribute arbitrary texts to a particular LLM. While recent works have demonstrated that state-of-the-art schemes are in fact vulnerable to spoofing, they lack deeper qualitative analysis of the texts produced by spoofing methods. In this work, we for the first time reveal that there are observable differences between genuine and spoofed watermark texts. Namely, we show that regardless of their underlying approach, all current spoofing methods consistently leave observable artifacts in spoofed texts, indicative of watermark forgery. We build upon these findings to propose rigorous statistical tests that reliably reveal the presence of such artifacts, effectively discovering that a watermark was spoofed. Our experimental evaluation shows high test power across all current spoofing methods, providing insights into their fundamental limitations, and suggesting a way to mitigate this threat. The improving abilities of large language models (LLMs) to generate human-like text at scale (Bubeck et al., 2023; Dubey et al., 2024) come with a growing risk of potential misuse. Hence, reliable detection of machine-generated text becomes increasingly important. Researchers have proposed the concept of watermarking: augmenting generated text with an imperceptible signal that can later be detected to attribute ownership of a text to a specific LLM (Kirchenbauer et al., 2023; Kuditipudi et al., 2023; Christ et al., 2024). Major LLM companies have pledged to watermark their models (Bartz & Hu, 2023), and regulators actively advocate for their use (Biden, 2023; CEU, 2024).
arXiv.org Artificial Intelligence
Oct-3-2024
- Country:
- North America > United States (0.46)
- Genre:
- Research Report (1.00)
- Industry:
- Government > Regional Government (0.46)
- Information Technology > Security & Privacy (0.70)
- Technology: