Watermarking Makes Language Models Radioactive Tom Sander

Neural Information Processing Systems 

Current methods like membership inference or active IP protection either work only in settings where the suspected text is known or do not provide reliable statistical guarantees. We discover that, on the contrary, it is possible to reliably determine if a language model was trained on synthetic data if that data is output by a watermarked LLM.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found