Robust Federated Learning Against Poisoning Attacks: A GAN-Based Defense Framework

Zafar, Usama, Teixeira, André, Toor, Salman

arXiv.org Artificial Intelligence 

--Federated Learning (FL) enables collaborative model training across decentralized devices without sharing raw data, but it remains vulnerable to poisoning attacks that compromise model integrity. Existing defenses often rely on external datasets or predefined heuristics (e.g. T o address these limitations, we propose a privacy-preserving defense framework that leverages a Conditional Generative Adversarial Network (cGAN) to generate synthetic data at the server for authenticating client updates, eliminating the need for external datasets. Our framework is scalable, adaptive, and seamlessly integrates into FL workflows. Extensive experiments on benchmark datasets demonstrate its robust performance against a variety of poisoning attacks, achieving high True Positive Rate (TPR) and True Negative Rate (TNR) of malicious and benign clients, respectively, while maintaining model accuracy. The proposed framework offers a practical and effective solution for securing federated learning systems. N an era of data-driven artificial intelligence, organizations increasingly rely on large-scale machine learning models trained on vast amounts of user data. From personalized recommendation systems to predictive healthcare analytics, the success of these models hinges on access to diverse and representative datasets [1]. However, collecting and centralizing user data raises serious privacy concerns, as evidenced by high-profile data breaches and regulatory actions. Notable incidents, such as the Facebook-Cambridge Analytica scandal [2] and the Equifax data breach [3], have underscored the risks of centralized data storage and processing. These incidents not only resulted in significant financial penalties and reputational damage but also eroded public trust in data-driven technologies. Companies such as Google and Facebook have faced substantial penalties for mishandling user data, with fines reaching billions of dollars under regulations like the General Data Protection Regulation (GDPR) [4] and the California Consumer Privacy Act (CCP A) [5]. The rising awareness of digital privacy has fueled the demand for decentralized learning paradigms that minimize data exposure while enabling collaborative model training. Usama Zafar, Andr e Teixeira, and Salman Toor are with Department of Information Technology, Uppsala University, 751 05 Uppsala, Sweden.