Defending Against Beta Poisoning Attacks in Machine Learning Models

Gulciftci, Nilufer, Gursoy, M. Emre

arXiv.org Artificial Intelligence 

--Poisoning attacks, in which an attacker adversarially manipulates the training dataset of a machine learning (ML) model, pose a significant threat to ML security. Beta Poisoning is a recently proposed poisoning attack that disrupts model accuracy by making the training dataset linearly nonseparable. In this paper, we propose four defense strategies against Beta Poisoning attacks: kNN Proximity-Based Defense (KPB), Neighborhood Class Comparison (NCC), Clustering-Based Defense (CBD), and Mean Distance Threshold (MDT). The defenses are based on our observations regarding the characteristics of poisoning samples generated by Beta Poisoning, e.g., poisoning samples have close proximity to one another, and they are centered near the mean of the target class. Experimental evaluations using MNIST and CIF AR-10 datasets demonstrate that KPB and MDT can achieve perfect accuracy and F1 scores, while CBD and NCC also provide strong defensive capabilities. Furthermore, by analyzing performance across varying parameters, we offer practical insights regarding defenses' behaviors under varying conditions. Machine learning (ML) models have become integral components in various domains, including finance, healthcare, cy-bersecurity, and autonomous systems. However, the robustness and trustworthiness of ML models are frequently challenged by adversarial attacks [1]. Poisoning attacks constitute an important category of adversarial attacks, in which an attacker purposefully manipulates the training dataset to compromise the integrity of an ML model, e.g., degrade model accuracy or mislead its predictions [1], [2], [3].