Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data
Wang, Lu, Zhang, Huan, Yi, Jinfeng, Hsieh, Cho-Jui, Jiang, Yuan
It has been shown that machine learning models, especially deep neural networks, are vulnerable to small adversarial perturbations, i.e., a small carefully crafted perturbation added to the input may significantly change the prediction results (Szegedy et al., 2014; Goodfellow et al., 2015; Biggio and Roli, 2018; Fawzi et al., 2018). Therefore, the problem of finding those perturbations, also known as adversarial attacks, has become an important way to evaluate the model robustness: the more difficult to attack a given model, the more robust it is. Depending on the information an adversary can access, the adversarial attacks can be classified into white-box and black-box settings. In the white-box setting, the target model is completely exposed to the attacker, and adversarial perturbations could be easily crafted by exploiting the first-order information, i.e., gradients with respect to the input (Carlini and Wagner, 2017; Madry et al., 2018). Despite of its efficiency and effectiveness, the white-box setting is an overly strong and pessimistic threat model, and white-box attacks are usually not practical when attacking real-world machine learning systems due to the invisibility of the gradient information. Instead, we focus on the problem of black-box attacks, where the model structure and parameters (weights) are not available to the attacker.
May-11-2020
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Information Technology > Security & Privacy (1.00)
- Transportation > Air (0.90)
- Technology: