Dataset Security for Machine Learning: Data Poisoning, Backdoor Attacks, and Defenses

Goldblum, Micah, Tsipras, Dimitris, Xie, Chulin, Chen, Xinyun, Schwarzschild, Avi, Song, Dawn, Madry, Aleksander, Li, Bo, Goldstein, Tom

Dec-29-2020–arXiv.org Artificial Intelligence

Traditional approaches to computer security isolate systems from the outside world through a combination of firewalls, passwords, data encryption, and other access control measures. In contrast, dataset creators often invite the outside world in -- data-hungry neural network models are built by harvesting information from anonymous and unverified sources on the web. Such open-world dataset creation methods can be exploited in several ways. Outsiders can passively manipulate datasets by placing corrupted data on the web and waiting for data harvesting bots to collect them. Active dataset manipulation occurs when outsiders have the privilege of sending corrupted samples directly to a dataset aggregator such as a chatbot, spam filter, or database of user profiles. Adversaries may also inject data into systems that rely on federated learning, in which models are trained on a diffuse network of edge devices that communicate periodically with a central server. In this case, users have complete control over the training data and labels seen by their device, in addition to the content of updates sent to the central server.

arxiv preprint arxiv, deep learning, neural network, (15 more...)

arXiv.org Artificial Intelligence

Dec-29-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.14)
  - Illinois (0.14)

Genre:
- Research Report (0.64)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning
    - Neural Networks > Deep Learning (1.00)
    - Statistical Learning (1.00)
  - Security & Privacy (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found