Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Yan, Han, Liu, Zheyuan, Jiang, Meng
–arXiv.org Artificial Intelligence
However, given limited time and computational resources, retraining LLMs to mitigate the influence of undesired data is impractical. Thus, Machine Unlearning (MU) emerges as an alternative solution to weaken a model's performance on undesired knowledge (Liu et al., 2024b; Eldan & Russinovich, 2023) while preserving the model's original utility (Liu et al., 2025). Though much research shed light on MU, several recent studies indicate that MU still lacks robustness (Zhang et al., 2024c; Y uan et al., 2025; Lee et al., 2025). In particular, they are susceptible to both jailbreak attacks (Zou et al., 2023; Andriushchenko et al., 2024) and relearning attacks (Hu et al., 2024). Such limitations can be exploited through reusing small amount of unlearned knowledge (Hu et al., 2024) or adversarial prompt manipulations, including prefix injection (Andriushchenko et al., 2024) and adaptive jailbreaks (Liu et al., 2023). These attacks act as small perturbations in parameter or representation space, driving the model along directions that yield undesired content that should have been forgotten (Fan et al., 2025; Lin et al., 2024b). Smoothness Minimization can be introduced to enhance model robustness against these attacks by promoting a smooth loss across the neighborhood (Foret et al., 2020; Fan et al., 2025). Plenty of studies have sought to remove undesired data to improve the effectiveness of MU and these approaches have demonstrated substantial unlearning performance (Liu et al., 2022a; Thudi et al., 2022; Zou et al., 2024; Pawelczyk et al., 2023; Liu et al., 2024a). However, they still suffer from limitations in robustness and trade-off between model utility and unlearning effectiveness.
arXiv.org Artificial Intelligence
Sep-30-2025
- Genre:
- Research Report > New Finding (0.87)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Law (0.93)
- Government > Military (0.67)
- Technology: