Leave No TRACE: Black-box Detection of Copyrighted Dataset Usage in Large Language Models via Watermarking

Zhang, Jingqi, Chen, Ruibo, Yang, Yingqing, Mai, Peihua, Huang, Heng, Pang, Yan

Oct-6-2025–arXiv.org Artificial Intelligence

Large Language Models (LLMs) are increasingly fine-tuned on smaller, domain-specific datasets to improve downstream performance. Existing membership inference attacks (MIAs) and dataset-inference methods typically require access to internal signals such as log-its, while current black-box approaches often rely on handcrafted prompts or a clean reference dataset for calibration, both of which limit practical applicability. Watermarking is a promising alternative, but prior techniques can degrade text quality or reduce task performance. TRACE rewrites datasets with distortion-free watermarks guided by a private key, ensuring both text quality and downstream utility. At detection time, we exploit the radioactivity effect of fine-tuning on watermarked data and introduce an entropy-gated procedure that selectively scores high-uncertainty tokens, substantially amplifying detection power. Across diverse datasets and model families, TRACE consistently achieves significant detections (p < 0.05), often with extremely strong statistical evidence. Furthermore, it supports multi-dataset attribution and remains robust even after continued pretraining on large non-watermarked corpora. Large Language Models (LLMs) have demonstrated strong performance across real-world applications, from conversational agents (Thoppilan et al. (2022)) and educational tutoring (Wang et al. (2024)) to medical support (Thirunavukarasu et al. (2023)). Their capabilities stem from pre-training on massive text corpora (Hoffmann et al. (2022)) and, crucially for real deployments, from subsequent fine-tuning on smaller, domain-specific datasets curated by enterprises or individual researchers (Wei et al. (2021)).

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Oct-6-2025

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report
  - Experimental Study (0.68)
  - New Finding (0.48)

Industry:
- Information Technology > Security & Privacy (1.00)
- Law (0.93)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)