An Unforgeable Publicly Verifiable Watermark for Large Language Models

Liu, Aiwei, Pan, Leyi, Hu, Xuming, Li, Shu'ang, Wen, Lijie, King, Irwin, Yu, Philip S.

Dec-11-2023–arXiv.org Artificial Intelligence

However, current watermark detection algorithms require the secret key used in the watermark generation process, making them susceptible to security breaches and counterfeiting during public detection. To address this limitation, we propose an unforgeable publicly verifiable watermark algorithm that uses two different neural networks for watermark generation and detection, instead of using the same key at both stages. Meanwhile, the token embedding parameters are shared between the generation and detection networks, which makes the detection network achieve a high accuracy very efficiently. Experiments demonstrate that our algorithm attains high detection accuracy and computational efficiency through neural networks with a minimized number of parameters. Subsequent analysis confirms the high complexity involved in forging the watermark from the detection network. Therefore, texts generated by LLMs need to be detected and tagged. At present, some watermarking algorithms for LLM have proved successful in making machinegenerated texts detectable by adding implicit features during the text generation process that are difficult for humans to discover but easily detected by the specially designed method (Christ et al., 2023; Kirchenbauer et al., 2023). The current watermark algorithms for large models utilize a shared key during the generation and detection of watermarks. They work well when the detection access is restricted to the watermark owner only. However, in many situations, when third-party watermark detection is required, the exposure of the shared key would enable others to forge the watermark. Therefore, preventing the watermark forge in the public detection setting, is of great importance. In this work, we propose the first unforgeable publicly verifiable watermarking algorithm for large language models (LLMs).

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Dec-11-2023

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois (0.14)
  - Texas (0.14)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (0.71)
      - Performance Analysis > Accuracy (0.46)
    - Natural Language > Large Language Model (1.00)
  - Security & Privacy (1.00)