Adversarial training with restricted data manipulation

Benfield, David, Coniglio, Stefano, Vuong, Phan Tu, Zemkoho, Alain

arXiv.org Artificial Intelligence 

Adversarial machine learning considers the exploitable vulnerabilities of machine learning models and the strategies needed to counter or mitigate such threats [32]. By considering these vulnerabilities during the development stage of our machine learning models, we can work to build resilient methods [9, 11] such as protection from credit card fraud [35] or finding the optimal placement of air defence systems [20]. In particular, we consider the model's sensitivity to changes in the distribution of the data. The way the adversary influences the distribution can fall under numerous categories, see [21] for a helpful taxonomy that categorises these attacks. We focus on the specific case of exploratory attacks, which consider the scenarios where adversaries attempt to modify their data to evade detection by a classifier. Such attacks might occur in security scenarios such as malware detection [3] and network intrusion traffic [31]. In a similar vein, and more recently, vulnerabilities in deep neural networks (DNN) are being discovered, particularly in the field of computer vision and image classification; small perturbations in the data can lead to incorrect classifications by the DNN [33, 19]. These vulnerabilities raise concerns about the robustness of the machine learning technology that is being adopted and, in some cases, in how safe relying on their predictions could be in high-risk scenarios such as autonomous driving [15] and medical diagnosis [16]. By modelling the adversary's behaviour and anticipating these attacks, we can train classifiers that are resilient to such changes in the distribution before they occur.