Denoising and Verification Cross-Layer Ensemble Against Black-box Adversarial Attacks

Chow, Ka-Ho, Wei, Wenqi, Wu, Yanzhao, Liu, Ling

arXiv.org Machine Learning 

--Deep neural networks (DNNs) have demonstrated impressive performance on many challenging machine learning tasks. However, DNNs are vulnerable to adversarial inputs generated by adding maliciously crafted perturbations to the benign inputs. As a growing number of attacks have been reported to generate adversarial inputs of varying sophistication, the defense-attack arms race has been accelerated. MODEF intelligently combines unsupervised model denoising ensemble with supervised model verification ensemble by quantifying model diversity, aiming to boost the robustness of the target model against adversarial examples. Evaluated using eleven representative attacks on popular benchmark datasets, we show that MODEF achieves remarkable defense success rates, compared with existing defense methods, and provides a superior capability of repairing adversarial inputs and making correct predictions with high accuracy in the presence of black-box attacks. The recent advances in deep neural networks (DNNs) have powered numerous applications in different domains due to their outstanding performance compared to traditional machine learning techniques. However, it has been shown that DNNs can be easily fooled by adversarial inputs [1], making them become a double-edged sword as the vulnerability of DNNs to adversarial attacks has posed serious threats to many security-critical applications, such as biometric authentication and autonomous driving. As a number of defenses are being proposed, more attacks of varying sophistication have been put forward, accelerating the defense-attack arms race. Some even argue that designing new attacks requires much less efforts than developing effective defenses. Thus, improving the robustness and defensibility against adversarial attacks is crucial. Adversarial examples are generated by maliciously perturbing benign examples sent to the target DNN model through querying its prediction API, aiming to fool and mislead the target model to misclassify by producing incorrect predictions randomly (untargeted attack) or purposefully (targeted attack).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found