Tang, Minxue
Proactive Privacy Amnesia for Large Language Models: Safeguarding PII with Negligible Impact on Model Utility
Kuo, Martin, Zhang, Jingyang, Zhang, Jianyi, Tang, Minxue, DiValentin, Louis, Ding, Aolin, Sun, Jingwei, Chen, William, Hass, Amin, Chen, Tianlong, Chen, Yiran, Li, Hai
With the rise of large language models (LLMs), increasing research has recognized their risk of leaking personally identifiable information (PII) under malicious attacks. Although efforts have been made to protect PII in LLMs, existing methods struggle to balance privacy protection with maintaining model utility. In this paper, inspired by studies of amnesia in cognitive science, we propose a novel approach, Proactive Privacy Amnesia (PPA), to safeguard PII in LLMs while preserving their utility. This mechanism works by actively identifying and forgetting key memories most closely associated with PII in sequences, followed by a memory implanting using suitable substitute memories to maintain the LLM's functionality. We conduct evaluations across multiple models to protect common PII, such as phone numbers and physical addresses, against prevalent PII-targeted attacks, demonstrating the superiority of our method compared with other existing defensive techniques. The results show that our PPA method completely eliminates the risk of phone number exposure by 100% and significantly reduces the risk of physical address exposure by 9.8% - 87.6%, all while maintaining comparable model utility performance. Large Language Models (LLMs) (Touvron et al., 2023; Achiam et al., 2023; Team et al., 2023; Dubey et al., 2024) have achieved remarkable success in recent years, with their wide adoption either as general-purpose models or, after fine-tuning, as specialized and personal assistants. Despite their success, LLMs with huge parameter counts and great capacity in the meantime exhibit the concerning "memorization" phenomenons (Carlini et al., 2019; 2021), i.e., they can precisely memorize some training data. Such memorization is vulnerable to various attacks (e.g., membership inference attacks and data extraction attacks) and risks severe privacy breaches. One of the most serious concerns comes from the attacks that aim to extract personal identifiable information (PII) memorized by the models, which compromise users' privacy and are likely to cause real-world harm consequently. To defend against such PII or data extraction attacks, several machine unlearning techniques have been applied to LLMs. However, existing methods typically fall short in terms of the trade-off between the defense performance and model utility. For example, most unlearning approaches are based on gradient ascent (Jang et al., 2022; Wang et al., 2024) and often adversely affect model functionalities to an extent where the model cannot handle their original tasks anymore and thus becomes no longer useful.
Fed-CBS: A Heterogeneity-Aware Client Sampling Mechanism for Federated Learning via Class-Imbalance Reduction
Zhang, Jianyi, Li, Ang, Tang, Minxue, Sun, Jingwei, Chen, Xiang, Zhang, Fan, Chen, Changyou, Chen, Yiran, Li, Hai
Due to limited communication capacities of edge devices, most existing federated learning (FL) methods randomly select only a subset of devices to participate in training for each communication round. Compared with engaging all the available clients, the random-selection mechanism can lead to significant performance degradation on non-IID (independent and identically distributed) data. In this paper, we show our key observation that the essential reason resulting in such performance degradation is the class-imbalance of the grouped data from randomly selected clients. Based on our key observation, we design an efficient heterogeneity-aware client sampling mechanism, i.e., Federated Class-balanced Sampling (Fed-CBS), which can effectively reduce class-imbalance of the group dataset from the intentionally selected clients. In particular, we propose a measure of class-imbalance and then employ homomorphic encryption to derive this measure in a privacy-preserving way. Based on this measure, we also design a computation-efficient client sampling strategy, such that the actively selected clients will generate a more class-balanced grouped dataset with theoretical guarantees. Extensive experimental results demonstrate Fed-CBS outperforms the status quo approaches. Furthermore, it achieves comparable or even better performance than the ideal setting where all the available clients participate in the FL training.
FADE: Enabling Federated Adversarial Training on Heterogeneous Resource-Constrained Edge Devices
Tang, Minxue, Zhang, Jianyi, Ma, Mingyuan, DiValentin, Louis, Ding, Aolin, Hassanzadeh, Amin, Li, Hai, Chen, Yiran
Federated adversarial training can effectively complement adversarial robustness into the privacy-preserving federated learning systems. However, the high demand for memory capacity and computing power makes large-scale federated adversarial training infeasible on resource-constrained edge devices. Few previous studies in federated adversarial training have tried to tackle both memory and computational constraints simultaneously. In this paper, we propose a new framework named Federated Adversarial Decoupled Learning (FADE) to enable AT on heterogeneous resource-constrained edge devices. FADE differentially decouples the entire model into small modules to fit into the resource budget of each device, and each device only needs to perform AT on a single module in each communication round. We also propose an auxiliary weight decay to alleviate objective inconsistency and achieve better accuracy-robustness balance in FADE. FADE offers theoretical guarantees for convergence and adversarial robustness, and our experimental results show that FADE can significantly reduce the consumption of memory and computing power while maintaining accuracy and robustness.
Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards
Li, Siyuan, Wang, Rui, Tang, Minxue, Zhang, Chongjie
Hierarchical Reinforcement Learning (HRL) is a promising approach to solving long-horizon problems with sparse and delayed rewards. Many existing HRL algorithms either use pre-trained low-level skills that are unadaptable, or require domain-specific information to define low-level rewards. In this paper, we aim to adapt low-level skills to downstream tasks while maintaining the generality of reward design. We propose an HRL framework which sets auxiliary rewards for low-level skill training based on the advantage function of the high-level policy. This auxiliary reward enables efficient, simultaneous learning of the high-level policy and low-level skills without using task-specific knowledge. In addition, we also theoretically prove that optimizing low-level skills with this auxiliary reward will increase the task return for the joint policy. Experimental results show that our algorithm dramatically outperforms other state-of-the-art HRL methods in Mujoco domains. We also find both low-level and high-level policies trained by our algorithm transferable.