Towards Better Statistical Understanding of Watermarking LLMs
Cai, Zhongze, Liu, Shang, Wang, Hanzhao, Zhong, Huaiyang, Li, Xiaocheng
As the ability of large language models (LLMs) evolves rapidly, their applications have gradually touched every corner of our daily lives. However, these fast-developing tools raise concerns about the abuse of LLMs. The misuse of LLMs could harm human society in ways such as launching bots on social media, creating fake news and content, and cheating on writing school essays. The overwhelming synthetic data created by the LLMs rather than real humans is also dragging down the efforts to improve the LLMs themselves: the synthetic data pollutes the data pool and should be detected and removed to create a high-quality dataset before training (Radford et al., 2023). Numerous attempts have been made to make the detection possible which can mainly be classified into two categories: post hoc detection that does not modify the language model and the watermarking that changes the output to encode information in the content. Post hoc detection aims to train models that directly label the texts without monitoring the generation process. Although post hoc detections do not require access to modify the output of LLMs, they do make use of statistical features such as the internal activations of the LLMs. For example, when being inspected by another LLM, the statistical properties of machine-generated texts deviate from the human-generated ones in some aspects such as the distributions of token log-likelihoods (Gehrmann et al., 2019; Ippolito et al., 2019; Zellers et al., 2019; Solaiman et al., 2019; Tian, 2023; Mitchell et al., 2023). However, post hoc ways usually rely on the fundamental assumption that machine-generated texts statistically deviate from human-generated texts, which could be challenged in two ways.
Mar-18-2024
- Country:
- Europe > United Kingdom
- England (0.14)
- North America > United States (0.27)
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (0.45)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: