PII-Compass: Guiding LLM training data extraction prompts towards the target PII via grounding

Nakka, Krishna Kanth, Frikha, Ahmed, Mendes, Ricardo, Jiang, Xue, Zhou, Xuebing

Jul-3-2024–arXiv.org Artificial Intelligence

Hereby, we investigate over 100 hand-crafted and synthetically generated prompts and find that the Memorization in Large Language Models (LLMs) correct PII is extracted in less than 1% of cases. In has recently enjoyed a surge of interest (Hartmann contrast, using the true prefix of the target PII as et al., 2023) ranging from memorization localization a single query yields extraction rates of up to 6%. (Maini et al., 2023), quantification (Carlini Second, we propose PII-Compass, a novel method et al., 2022) to controlling (Ozdayi et al., 2023) and that achieves a substantially higher extraction rate auditing (Zhang et al., 2023a). The major reason than simple adversarial prompts. Our approach is for this is the risk of training data extraction (Carlini based on the intuition that querying the model with et al., 2021; Ishihara, 2023). To assess this risk, a prompt that has a close embedding to the embedding various methods have been proposed in prior work of the target piece of data, i.e., the PII and its (Yu et al., 2023; Zhang et al., 2023b; Panda et al., prefix, should increase the likelihood of extracting 2024; Wang et al., 2024). In this work, we aim to the PII. We do this by prepending the hand-crafted assess the privacy leakage risk of a subclass of training prompt with a true prefix of a different data subject data, namely personal identifiable information than the targeted data subject.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Jul-3-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.96)
  - Natural Language > Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found