The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Zeng, Shenglai, Zhang, Jiankun, He, Pengfei, Xing, Yue, Liu, Yiding, Xu, Han, Ren, Jie, Wang, Shuaiqiang, Yin, Dawei, Chang, Yi, Tang, Jiliang

Feb-23-2024–arXiv.org Artificial Intelligence

On the other 2023; Shi et al., 2023) is an advanced natural language hand, the retrieval process in RAG could also influence processing technique that enhances text generation the behavior of the LLMs for text-generation, by integrating information retrieved from and this could possibly cause the LLMs to output a large corpus of documents. These techniques private information from its training/fine-tuning enable RAG to produce accurate and contextually dataset. Notably, there are existing works (Carlini relevant outputs with augmented external knowledge et al., 2021; Kandpal et al., 2022; Lee et al., and have been widely used in various scenarios 2021; Carlini et al., 2022; Zeng et al., 2023) observing such as domain-specific chatbots (Siriwardhana that LLMs can remember and leak private et al., 2023) and email/code completion (Parvez information from their pre-training and fine-tuning et al., 2021). RAG systems typically work in two data. However, how the integration of external retrieval phases, as shown in Fig 1 - retrieval and generation.

information, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

Feb-23-2024

arXiv.org PDF

Add feedback

Country:
- Asia (0.28)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.83)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found