Just How Toxic is Data Poisoning? A Unified Benchmark for Backdoor and Data Poisoning Attacks
Schwarzschild, Avi, Goldblum, Micah, Gupta, Arjun, Dickerson, John P, Goldstein, Tom
Data poisoning and backdoor attacks manipulate training data in order to cause models to fail during inference. A recent survey of industry practitioners found that data poisoning is the number one concern among threats ranging from model stealing to adversarial attacks. However, we find that the impressive performance evaluations from data poisoning attacks are, in large part, artifacts of inconsistent experimental design. Moreover, we find that existing poisoning methods have been tested in contrived scenarios, and many fail in more realistic settings. In order to promote fair comparison in future work, we develop standardized benchmarks for data poisoning and backdoor attacks. Data poisoning is a security threat to machine learning systems in which an attacker controls the behavior of a system by manipulating its training data. This class of threats is particularly germane to deep learning systems because they require large amounts of data to train and are therefore often trained (or pre-trained) on large datasets scraped from the web. For example, the Open Images and the Amazon Products datasets contain approximately 9 million and 233 million samples, respectively, that are scraped from a wide range of potentially insecure, and in many cases unknown, sources (Kuznetsova et al., 2020; Ni, 2018). At this scale, it is often infeasible to properly vet content. Furthermore, many practitioners create datasets by harvesting system inputs (e.g., emails received, files uploaded) or scraping user-created content (e.g., profiles, text messages, advertisements) without any mechanisms to bar malicious actors from contributing data. The dependence of industrial AI systems on datasets that are not manually inspected has led to fear that corrupted training data could produce faulty models (Jiang et al., 2017). In fact, a recent survey of 28 industry organizations found that these companies are significantly more afraid of data poisoning than other threats from adversarial machine learning (Kumar et al., 2020). A spectrum of poisoning attacks exists in the literature. Backdoor data poisoning causes a model to misclassify test-time samples that contain a trigger - a visual feature in images or a particular character sequence in the natural language setting (Chen et al., 2017; Dai et al., 2019; Saha et al., For example, one might tamper with training images so that a vision system fails to identify any person wearing a shirt with the trigger symbol printed on it.
Nov-2-2020
- Country:
- North America > United States > Maryland (0.14)
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Technology: