Knowledge Sanitization of Large Language Models

Ishibashi, Yoichi, Shimodaira, Hidetoshi

Sep-21-2023–arXiv.org Artificial Intelligence

We explore a knowledge sanitization approach to mitigate the privacy concerns associated with large language models (LLMs). LLMs trained on a large corpus of Web data can memorize and potentially reveal sensitive or confidential information, raising critical security concerns. Our technique fine-tunes these models, prompting them to generate harmless responses such as ``I don't know'' when queried about specific information. Experimental results in a closed-book question-answering task show that our straightforward method not only minimizes particular knowledge leakage but also preserves the overall performance of LLM. These two advantages strengthen the defense against extraction attacks and reduces the emission of harmful content such as hallucinations.

information, knowledge, sanitization, (14 more...)

arXiv.org Artificial Intelligence

Sep-21-2023

arXiv.org PDF

Add feedback

Country:
- North America
  - Montserrat (0.04)
  - Dominican Republic (0.04)
  - United States
    - Washington > King County
      - Seattle (0.04)
    - Minnesota > Hennepin County
      - Minneapolis (0.14)
    - California > Los Angeles County
      - Long Beach (0.04)
  - Canada
    - Ontario > Toronto (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - Austria > Vienna (0.14)
  - Italy > Tuscany
    - Florence (0.04)
  - Ireland > Leinster
    - County Dublin > Dublin (0.04)
  - Denmark > Capital Region
    - Copenhagen (0.04)
  - Croatia > Zagreb County
    - Zagreb (0.04)
  - Belgium > Brussels-Capital Region
    - Brussels (0.04)
- Asia
  - Afghanistan (0.04)
  - Middle East
    - Republic of Türkiye (0.04)
    - UAE > Abu Dhabi Emirate
      - Abu Dhabi (0.04)
  - Japan > Honshū
    - Kansai > Kyoto Prefecture > Kyoto (0.04)
  - China > Shaanxi Province
    - Xi'an (0.04)
- Africa
  - Middle East > Egypt (0.05)
  - Rwanda > Kigali
    - Kigali (0.04)

Genre:
- Research Report > New Finding (0.46)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found