Dissecting Fine-Tuning Unlearning in Large Language Models
Hong, Yihuai, Zou, Yuelin, Hu, Lijie, Zeng, Ziqian, Wang, Di, Yang, Haiqin
–arXiv.org Artificial Intelligence
Although earlier investigations (Hong et al., 2024; Lee et al., 2024a) have Consequently, of these fine-tuning-based unlearning methods recent research has focused on developing on LLaMA2-7B-chat (Touvron et al., 2023) and efficient unlearning methods as a post-training OLMo-7B (Groeneveld et al., 2024) by implementing technique to selectively unlearn the specific knowledge them on the respective pretraining datasets of (Blanco-Justicia et al., 2024; Liu et al., 2024). We discover that while these methods 2023; Jang et al., 2023; Yao et al., 2024; appear to effectively unlearn target knowledge, they Rafailov et al., 2023), with corresponding adjustments also inevitably affect the output and behavior related and designs in the loss function to facilitate to unrelated knowledge.
arXiv.org Artificial Intelligence
Oct-15-2024
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Technology: