Dissecting Fine-Tuning Unlearning in Large Language Models

Hong, Yihuai, Zou, Yuelin, Hu, Lijie, Zeng, Ziqian, Wang, Di, Yang, Haiqin

Oct-15-2024–arXiv.org Artificial Intelligence

Although earlier investigations (Hong et al., 2024; Lee et al., 2024a) have Consequently, of these fine-tuning-based unlearning methods recent research has focused on developing on LLaMA2-7B-chat (Touvron et al., 2023) and efficient unlearning methods as a post-training OLMo-7B (Groeneveld et al., 2024) by implementing technique to selectively unlearn the specific knowledge them on the respective pretraining datasets of (Blanco-Justicia et al., 2024; Liu et al., 2024). We discover that while these methods 2023; Jang et al., 2023; Yao et al., 2024; appear to effectively unlearn target knowledge, they Rafailov et al., 2023), with corresponding adjustments also inevitably affect the output and behavior related and designs in the loss function to facilitate to unrelated knowledge.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Oct-15-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > California (0.14)

Genre:
- Research Report (0.82)

Industry:
- Health & Medicine > Therapeutic Area
  - Endocrinology (0.47)
- Leisure & Entertainment (0.69)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language
    - Chatbot (1.00)
    - Large Language Model (1.00)