Advancing Beyond Identification: Multi-bit Watermark for Large Language Models
Yoo, KiYoon, Ahn, Wonhyuk, Kwak, Nojun
–arXiv.org Artificial Intelligence
We propose a method to tackle misuses of large language models beyond the identification of machine-generated text. While existing methods focus on detection, some malicious misuses demand tracing the adversary user for counteracting them. To address this, we propose Multi-bit Watermark via Position Allocation, embedding traceable multi-bit information during language model generation. Leveraging the benefits of zero-bit watermarking, our method enables robust extraction of the watermark without any model access, embedding and extraction of long messages ($\geq$ 32-bit) without finetuning, and maintaining text quality, while allowing zero-bit detection all at the same time. Moreover, our watermark is relatively robust under strong attacks like interleaving human texts and paraphrasing.
arXiv.org Artificial Intelligence
Sep-27-2023