Knowledge Sanitization of Large Language Models
Ishibashi, Yoichi, Shimodaira, Hidetoshi
–arXiv.org Artificial Intelligence
We explore a knowledge sanitization approach to mitigate the privacy concerns associated with large language models (LLMs). LLMs trained on a large corpus of Web data can memorize and potentially reveal sensitive or confidential information, raising critical security concerns. Our technique fine-tunes these models, prompting them to generate harmless responses such as ``I don't know'' when queried about specific information. Experimental results in a closed-book question-answering task show that our straightforward method not only minimizes particular knowledge leakage but also preserves the overall performance of LLM. These two advantages strengthen the defense against extraction attacks and reduces the emission of harmful content such as hallucinations.
arXiv.org Artificial Intelligence
Sep-21-2023
- Country:
- North America
- Montserrat (0.04)
- Dominican Republic (0.04)
- United States
- Washington > King County
- Seattle (0.04)
- Minnesota > Hennepin County
- Minneapolis (0.14)
- California > Los Angeles County
- Long Beach (0.04)
- Washington > King County
- Canada
- Ontario > Toronto (0.04)
- British Columbia > Metro Vancouver Regional District
- Vancouver (0.04)
- Europe
- Austria > Vienna (0.14)
- Italy > Tuscany
- Florence (0.04)
- Ireland > Leinster
- County Dublin > Dublin (0.04)
- Denmark > Capital Region
- Copenhagen (0.04)
- Croatia > Zagreb County
- Zagreb (0.04)
- Belgium > Brussels-Capital Region
- Brussels (0.04)
- Asia
- Afghanistan (0.04)
- Middle East
- Republic of Türkiye (0.04)
- UAE > Abu Dhabi Emirate
- Abu Dhabi (0.04)
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- China > Shaanxi Province
- Xi'an (0.04)
- Africa
- Middle East > Egypt (0.05)
- Rwanda > Kigali
- Kigali (0.04)
- North America
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: