Fine-tuning can Help Detect Pretraining Data from Large Language Models
Zhang, Hengxiang, Zhang, Songxin, Jing, Bingyi, Wei, Hongxin
–arXiv.org Artificial Intelligence
In the era of large language models (LLMs), detecting pretraining data has been increasingly important due to concerns about fair evaluation and ethical risks. Current methods differentiate members and non-members by designing scoring functions, like Perplexity and Min-k%. In this paper, we first explore the benefits of unseen data, which can be easily collected after the release of the LLM. We find that the perplexities of LLMs perform differently for members and non-members, after fine-tuning with a small amount of previously unseen data. In light of this, we introduce a novel and effective method termed Fine-tuned Score Deviation (FSD), which improves the performance of current scoring functions for pretraining data detection. In particular, we propose to measure the deviation distance of current scores after fine-tuning on a small amount of unseen data within the same domain. In effect, using a few unseen data can largely decrease the scores of all non-members, leading to a larger deviation distance than members. Extensive experiments demonstrate the effectiveness of our method, significantly improving the AUC score on common benchmark datasets across various models. The impressive performance of large language models (LLMs) arises from large-scale pretraining on massive datasets collected from the internet (Achiam et al., 2023; Touvron et al., 2023b). But, model developers are often reluctant to disclose detailed information about the pretraining datasets, raising significant concerns regarding fair evaluation and ethical risks. Specifically, Recent studies reveal that the pretraining corpus may inadvertently include data from evaluation benchmarks (Sainz et al., 2023; Balloccu et al., 2024), making it difficult to assess the practical capability of LLMs. Considering the vast size of the pretraining dataset and the single iteration of pretraining, it has been increasingly important and challenging to detect pretraining data, which determines whether a piece of text is part of the pretraining dataset.
arXiv.org Artificial Intelligence
Oct-9-2024
- Country:
- Asia > Philippines (0.28)
- Europe (0.46)
- North America > United States (0.28)
- Genre:
- Research Report > New Finding (0.47)
- Industry:
- Information Technology (0.67)
- Leisure & Entertainment (1.00)
- Media > Film (0.46)
- Technology: