PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding
Nakka, Krishna Kanth, Frikha, Ahmed, Mendes, Ricardo, Jiang, Xue, Zhou, Xuebing
–arXiv.org Artificial Intelligence
Hereby, we investigate over 100 hand-crafted and synthetically generated prompts and find that the Memorization in Large Language Models (LLMs) correct PII is extracted in less than 1% of cases. In has recently enjoyed a surge of interest (Hartmann contrast, using the true prefix of the target PII as et al., 2023) ranging from memorization localization a single query yields extraction rates of up to 6%. (Maini et al., 2023), quantification (Carlini Second, we propose PII-Compass, a novel method et al., 2022) to controlling (Ozdayi et al., 2023) and that achieves a substantially higher extraction rate auditing (Zhang et al., 2023a). The major reason than simple adversarial prompts. Our approach is for this is the risk of training data extraction (Carlini based on the intuition that querying the model with et al., 2021; Ishihara, 2023). To assess this risk, a prompt that has a close embedding to the embedding various methods have been proposed in prior work of the target piece of data, i.e., the PII and its (Yu et al., 2023; Zhang et al., 2023b; Panda et al., prefix, should increase the likelihood of extracting 2024; Wang et al., 2024). In this work, we aim to the PII. We do this by prepending the hand-crafted assess the privacy leakage risk of a subclass of training prompt with a true prefix of a different data subject data, namely personal identifiable information than the targeted data subject.
arXiv.org Artificial Intelligence
Jul-3-2024
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: