Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

Goldblum, Micah, Tsipras, Dimitris, Xie, Chulin, Chen, Xinyun, Schwarzschild, Avi, Song, Dawn, Madry, Aleksander, Li, Bo, Goldstein, Tom

arXiv.org Artificial Intelligence 

Traditional approaches to computer security isolate systems from the outside world through a combination of firewalls, passwords, data encryption, and other access control measures. In contrast, dataset creators often invite the outside world in -- data-hungry neural network models are built by harvesting information from anonymous and unverified sources on the web. Such open-world dataset creation methods can be exploited in several ways. Outsiders can passively manipulate datasets by placing corrupted data on the web and waiting for data harvesting bots to collect them. Active dataset manipulation occurs when outsiders have the privilege of sending corrupted samples directly to a dataset aggregator such as a chatbot, spam filter, or database of user profiles. Adversaries may also inject data into systems that rely on federated learning, in which models are trained on a diffuse network of edge devices that communicate periodically with a central server. In this case, users have complete control over the training data and labels seen by their device, in addition to the content of updates sent to the central server.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found