Watermarking Language Models with Error Correcting Codes

Chao, Patrick, Dobriban, Edgar, Hassani, Hamed

Jun-12-2024–arXiv.org Artificial Intelligence

As language model capabilities improve, there are corresponding potential harms such as the creation of misinformation (Zellers et al., 2020) and propaganda (Solaiman et al., 2019). To mitigate this, a first step is to detect and filter content. A popular approach to reliably detecting AI generated content is to add a watermark (Kirchenbauer et al., 2023; Kuditipudi et al., 2023; Aaronson and Kirchner, 2022; Christ et al., 2023), a hidden signal embedded in the output. While there are exponentially many combinations of words and characters, watermarking biases generation towards specific patterns that are undetectable to humans. We consider the detection setting from the model-provider's perspective: the detection algorithm receives (user or machine-generated) text as input, but no further metadata such as prompts or generation parameters. We do not explore zero-shot or post-hoc methods to classify text as generated from any language model, such as GPT-Zero (Tian and Cui, 2023) and DetectGPT (Mitchell et al., 2023). This model-agnostic detection is inherently challenging as language models are trained to mimic human text (Bender et al., 2021).

artificial intelligence, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

Jun-12-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > New York (0.14)

Genre:
- Research Report (1.00)

Industry:
- Government > Regional Government (0.93)
- Information Technology > Security & Privacy (0.89)
- Media (0.86)

Technology:
- Information Technology > Artificial Intelligence > Natural Language
  - Chatbot (0.69)
  - Large Language Model (0.89)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found