Towards Codable Watermarking for Injecting Multi-bit Information to LLM

Wang, Lean, Yang, Wenkai, Chen, Deli, Zhou, Hao, Lin, Yankai, Meng, Fandong, Zhou, Jie, Sun, Xu

arXiv.org Artificial Intelligence 

As large language models (LLMs) generate texts with increasing fluency and realism, there is a growing need to identify the source of texts to prevent the abuse of LLMs. Text watermarking techniques have proven reliable in distinguishing whether a text is generated by LLMs by injecting hidden patterns into the generated texts. However, we argue that existing watermarking methods for LLMs are encoding-inefficient (only contain one bit of information - whether it is generated from an LLM or not) and cannot flexibly meet the diverse information encoding needs (such as encoding model version, generation time, user id, etc.) in different LLMs application scenarios. In this work, we conduct the first systematic study on the topic of Codable Text Watermarking for LLMs (CTWL) that allows text watermarks to carry more customizable information. First of all, we study the taxonomy of LLM watermarking technology and give a mathematical formulation for CTWL. Additionally, we provide a comprehensive evaluation system for CTWL: (1) watermarking success rate, (2) robustness against various corruptions, (3) coding rate of payload information, (4) encoding and decoding efficiency, (5) impacts on the quality of the generated text. To meet the requirements of these non-Paretoimproving metrics, we devise a CTWL method named Balance-Marking, based on the motivation of ensuring that available and unavailable vocabularies for encoding information have approximately equivalent probabilities. Compared to the random vocabulary partitioning extended from the existing work, a probabilitybalanced vocabulary partition can significantly improve the quality of the generated text. Extensive experimental results have shown that our method outperforms a direct baseline under comprehensive evaluation. We hope this work can raise the community's awareness of the importance of CTWL and inspire further research on designing more efficient, practical, and robust watermarking methods for LLMs. Recently, with the explosive development of Large Language Models (LLMs) (OpenAI, 2022; Touvron et al., 2023), there has been growing concern in the community about the potential negative effects of the AI-generated content (AIGC).