Improving Adversarial Robustness via Unlabeled Out-of-Domain Data
Deng, Zhun, Zhang, Linjun, Ghorbani, Amirata, Zou, James
Robustness to adversarial attacks has been a major focus in machine learning security [4,12,26], and has been intensively studied in the past few years [8, 15, 32]. However, the theoretical understanding of adversarial robustness is still far from being satisfactory. Research [36] have demonstrated sample complexity may be one of the obstacles in achieving high robustness under standard learning, which is a large challenge since in many real-world applications, labeled examples are few and expensive. To address this challenge, recent works [9, 37] showed that adversarial robustness can be improved by leveraging unlabeled data that come from the same distribution/domain as the original labeled training samples. Nevertheless, that is still limited due to the difficulty to make sure that the unlabeled data are exactly from the same distribution as the labeled data. For example, gathering a large number of unlabeled images that follow the same distribution as CIFAR-10 is challenging, since one would have to carefully match the same lighting conditions, backgrounds, etc. Meanwhile, out-of-domain unlabeled data can be much easier and cheaper to collect. For instance, we used Bing search engine to query a small number of keywords and, within hours, generated a new 500k dataset of noisy CIFAR-10 categories; we call this Cheap-10.
Jun-15-2020
- Country:
- North America > United States
- California > Santa Clara County > Palo Alto (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report (0.50)
- Industry:
- Transportation (0.46)
- Information Technology > Security & Privacy (0.34)
- Government > Military (0.34)
- Technology: