Probing Knowledge Holes in Unlearned LLMs

Jun-15-2026, 14:26:19 GMT–Neural Information Processing Systems

Machine unlearning has emerged as a prevalent technical solution for selectively removing unwanted knowledge absorbed during pre-training, without requiring full retraining. While recent unlearning techniques can effectively remove undesirable content without severely compromising performance on standard benchmarks, we find that they may inadvertently create "knowledge holes"--unintended losses of benign knowledge that standard benchmarks fail to capture. To probe where unlearned models reveal knowledge holes, we propose a test case generation framework that explores both immediate neighbors of unlearned content and broader areas of potential failures. Our evaluation demonstrates significant hidden costs of unlearning: up to 98.7% of the test cases yield irrelevant or nonsensical responses from unlearned models, despite being answerable by the pretrained model.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-15-2026, 14:26:19 GMT

Conferences PDF

Add feedback

Country:
- North America (0.28)

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine
  - Pharmaceuticals & Biotechnology (1.00)
  - Consumer Health (1.00)
  - Therapeutic Area
    - Psychiatry/Psychology (0.94)
    - Infections and Infectious Diseases (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (1.00)
  - Machine Learning (1.00)
  - Representation & Reasoning (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found