The Good and The Bad: Exploring Privacy Issues in Retrieval-Augmented Generation (RAG)

Zeng, Shenglai, Zhang, Jiankun, He, Pengfei, Xing, Yue, Liu, Yiding, Xu, Han, Ren, Jie, Wang, Shuaiqiang, Yin, Dawei, Chang, Yi, Tang, Jiliang

arXiv.org Artificial Intelligence 

On the other 2023; Shi et al., 2023) is an advanced natural language hand, the retrieval process in RAG could also influence processing technique that enhances text generation the behavior of the LLMs for text-generation, by integrating information retrieved from and this could possibly cause the LLMs to output a large corpus of documents. These techniques private information from its training/fine-tuning enable RAG to produce accurate and contextually dataset. Notably, there are existing works (Carlini relevant outputs with augmented external knowledge et al., 2021; Kandpal et al., 2022; Lee et al., and have been widely used in various scenarios 2021; Carlini et al., 2022; Zeng et al., 2023) observing such as domain-specific chatbots (Siriwardhana that LLMs can remember and leak private et al., 2023) and email/code completion (Parvez information from their pre-training and fine-tuning et al., 2021). RAG systems typically work in two data. However, how the integration of external retrieval phases, as shown in Fig 1 - retrieval and generation.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found