A Transfer Attack to Image Watermarks

Hu, Yuepeng, Jiang, Zhengyuan, Guo, Moyang, Gong, Neil

arXiv.org Artificial Intelligence 

Generative AI (GenAI) can synthesize extremely realistic-looking images, posing growing challenges to information authenticity on the Internet. Watermarking [1-7] was suggested as a key technology to distinguish AI-generated and non-AI-generated content in the Executive Order on AI security issued by the White House in October 2023. In watermarkbased detection, a watermark is embedded into an AI-generated image before releasing it; and an image is detected as AI-generated if the same watermark can be decoded from it. Watermarking AI-generated images has been widely deployed in industry. For instance, Google's SynthID watermarks images generated by Imagen [8]; OpenAI embeds a watermark into images generated by DALL-E [9]; and Stable Diffusion enables users to embed a watermark into the generated images [10]. An attacker can use evasion attacks [11] to remove the watermark in a watermarked image to evade detection. Specifically, an evasion attack strategically adds a perturbation into a watermarked image such that the target watermark-based detector falsely detects the perturbed image as non-AI-generated. The literature has well understood the robustness of watermark-based detector against evasion attacks in the white-box setting (i.e., the attacker has access to the target watermarking model) and black-box setting (i.e., the attacker has access to the detection API) [11]. Specifically, in the white-box setting, an attacker can find a small perturbation for a given watermarked image such that the perturbed image evades detection while maintaining the image's visual quality; and in the

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found