Improving Adversarial Robustness via Unlabeled Out-of-Domain Data

Deng, Zhun, Zhang, Linjun, Ghorbani, Amirata, Zou, James

arXiv.org Machine Learning 

Robustness to adversarial attacks has been a major focus in machine learning security [4,12,26], and has been intensively studied in the past few years [8, 15, 32]. However, the theoretical understanding of adversarial robustness is still far from being satisfactory. Research [36] have demonstrated sample complexity may be one of the obstacles in achieving high robustness under standard learning, which is a large challenge since in many real-world applications, labeled examples are few and expensive. To address this challenge, recent works [9, 37] showed that adversarial robustness can be improved by leveraging unlabeled data that come from the same distribution/domain as the original labeled training samples. Nevertheless, that is still limited due to the difficulty to make sure that the unlabeled data are exactly from the same distribution as the labeled data. For example, gathering a large number of unlabeled images that follow the same distribution as CIFAR-10 is challenging, since one would have to carefully match the same lighting conditions, backgrounds, etc. Meanwhile, out-of-domain unlabeled data can be much easier and cheaper to collect. For instance, we used Bing search engine to query a small number of keywords and, within hours, generated a new 500k dataset of noisy CIFAR-10 categories; we call this Cheap-10.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found