Radioactive data: tracing through training

Sablayrolles, Alexandre, Douze, Matthijs, Schmid, Cordelia, Jégou, Hervé

Feb-3-2020–arXiv.org Machine Learning

We want to detect whether a particular image dataset has been used to train a model. We propose a new technique, \emph{radioactive data}, that makes imperceptible changes to this dataset such that any model trained on it will bear an identifiable mark. The mark is robust to strong variations such as different architectures or optimization methods. Given a trained model, our technique detects the use of radioactive data and provides a level of confidence (p-value). Our experiments on large-scale benchmarks (Imagenet), using standard architectures (Resnet-18, VGG-16, Densenet-121) and training procedures, show that we can detect usage of radioactive data with high confidence (p<10^-4) even when only 1% of the data used to trained our model is radioactive. Our method is robust to data augmentation and the stochasticity of deep network optimization. As a result, it offers a much higher signal-to-noise ratio than data poisoning and backdoor methods.

classifier, dataset, radioactive data, (13 more...)

arXiv.org Machine Learning

Feb-3-2020

arXiv.org PDF

Add feedback

Country:
- Europe > Poland (0.04)

Genre:
- Research Report > Experimental Study (0.52)

Industry:
- Information Technology > Security & Privacy (1.00)

Technology:
- Information Technology
  - Security & Privacy (1.00)
  - Data Science (1.00)
  - Sensing and Signal Processing > Image Processing (0.93)
  - Artificial Intelligence
    - Vision (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (0.95)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found