Woisetschlaeger, Herbert
A Comprehensive Survey of Machine Unlearning Techniques for Large Language Models
Geng, Jiahui, Li, Qing, Woisetschlaeger, Herbert, Chen, Zongxiong, Wang, Yuxia, Nakov, Preslav, Jacobsen, Hans-Arno, Karray, Fakhri
This study investigates the machine unlearning techniques within the context of large language models (LLMs), referred to as \textit{LLM unlearning}. LLM unlearning offers a principled approach to removing the influence of undesirable data (e.g., sensitive or illegal information) from LLMs, while preserving their overall utility without requiring full retraining. Despite growing research interest, there is no comprehensive survey that systematically organizes existing work and distills key insights; here, we aim to bridge this gap. We begin by introducing the definition and the paradigms of LLM unlearning, followed by a comprehensive taxonomy of existing unlearning studies. Next, we categorize current unlearning approaches, summarizing their strengths and limitations. Additionally, we review evaluation metrics and benchmarks, providing a structured overview of current assessment methodologies. Finally, we outline promising directions for future research, highlighting key challenges and opportunities in the field.
A Survey on Dataset Distillation: Approaches, Applications and Future Directions
Geng, Jiahui, Chen, Zongxiong, Wang, Yuandou, Woisetschlaeger, Herbert, Schimmler, Sonja, Mayer, Ruben, Zhao, Zhiming, Rong, Chunming
Dataset distillation is attracting more attention in machine learning as training sets continue to grow and the cost of training state-of-the-art models becomes increasingly high. By synthesizing datasets with high information density, dataset distillation offers a range of potential applications, including support for continual learning, neural architecture search, and privacy protection. Despite recent advances, we lack a holistic understanding of the approaches and applications. Our survey aims to bridge this gap by first proposing a taxonomy of dataset distillation, characterizing existing approaches, and then systematically reviewing the data modalities, and related applications. In addition, we summarize the challenges and discuss future directions for this field of research.
A Comprehensive Study on Dataset Distillation: Performance, Privacy, Robustness and Fairness
Chen, Zongxiong, Geng, Jiahui, Zhu, Derui, Woisetschlaeger, Herbert, Li, Qing, Schimmler, Sonja, Mayer, Ruben, Rong, Chunming
The aim of dataset distillation is to encode the rich features of an original dataset into a tiny dataset. It is a promising approach to accelerate neural network training and related studies. Different approaches have been proposed to improve the informativeness and generalization performance of distilled images. However, no work has comprehensively analyzed this technique from a security perspective and there is a lack of systematic understanding of potential risks. In this work, we conduct extensive experiments to evaluate current state-of-the-art dataset distillation methods. We successfully use membership inference attacks to show that privacy risks still remain. Our work also demonstrates that dataset distillation can cause varying degrees of impact on model robustness and amplify model unfairness across classes when making predictions. This work offers a large-scale benchmarking framework for dataset distillation evaluation.