We propose LSDAT, an image-agnostic decision-based black-box attack that exploits low-rank and sparse decomposition (LSD) to dramatically reduce the number of queries and achieve superior fooling rates compared to the state-of-the-art decision-based methods under given imperceptibility constraints. LSDAT crafts perturbations in the low-dimensional subspace formed by the sparse component of the input sample and that of an adversarial sample to obtain query-efficiency. The specific perturbation of interest is obtained by traversing the path between the input and adversarial sparse components. It is set forth that the proposed sparse perturbation is the most aligned sparse perturbation with the shortest path from the input sample to the decision boundary for some initial adversarial sample (the best sparse approximation of shortest path, likely to fool the model). Theoretical analyses are provided to justify the functionality of LSDAT. Unlike other dimensionality reduction based techniques aimed at improving query efficiency (e.g, ones based on FFT), LSD works directly in the image pixel domain to guarantee that non-$\ell_2$ constraints, such as sparsity, are satisfied. LSD offers better control over the number of queries and provides computational efficiency as it performs sparse decomposition of the input and adversarial images only once to generate all queries. We demonstrate $\ell_0$, $\ell_2$ and $\ell_\infty$ bounded attacks with LSDAT to evince its efficiency compared to baseline decision-based attacks in diverse low-query budget scenarios as outlined in the experiments.
Deep neural networks are vulnerable to adversarial attacks. Among different attack settings, the most challenging yet the most practical one is the hard-label setting where the attacker only has access to the hard-label output (prediction label) of the target model. Previous attempts are neither effective enough in terms of attack success rate nor efficient enough in terms of query complexity under the widely used $L_\infty$ norm threat model. In this paper, we present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency. Unlike previous works, we reformulate the continuous problem of finding the closest decision boundary into a discrete problem that does not require any zeroth-order gradient estimation. In the meantime, all unnecessary searches are eliminated via a fast check step. This significantly reduces the number of queries needed for our hard-label attack. Moreover, interestingly, we found that the proposed RayS attack can also be used as a sanity check for possible "falsely robust" models. On several recently proposed defenses that claim to achieve the state-of-the-art robust accuracy, our attack method demonstrates that the current white-box/black-box attacks could still give a false sense of security and the robust accuracy drop between the most popular PGD attack and RayS attack could be as large as $28\%$. We believe that our proposed RayS attack could help identify falsely robust models that beat most white-box/black-box attacks.
We study the most practical problem setup for evaluating adversarial robustness of a machine learning system with limited access: the hard-label black-box attack setting for generating adversarial examples, where limited model queries are allowed and only the decision is provided to a queried data input. Several algorithms have been proposed for this problem but they typically require huge amount (>20,000) of queries for attacking one example. Among them, one of the state-of-the-art approaches (Cheng et al., 2019) showed that hard-label attack can be modeled as an optimization problem where the objective function can be evaluated by binary search with additional model queries, thereby a zeroth order optimization algorithm can be applied. In this paper, we adopt the same optimization formulation but propose to directly estimate the sign of gradient at any direction instead of the gradient itself, which enjoys the benefit of single query. Using this single query oracle for retrieving sign of directional derivative, we develop a novel query-efficient Sign-OPT approach for hard-label black-box attack. We provide a convergence analysis of the new algorithm and conduct experiments on several models on MNIST, CIFAR-10 and ImageNet. We find that Sign-OPT attack consistently requires 5X to 10X fewer queries when compared to the current state-of-the-art approaches, and usually converges to an adversarial example with smaller perturbation.
We consider adversarial examples in the black-box decision-based scenario. Here, an attacker has access to the final classification of a model, but not its parameters or softmax outputs. Most attacks for this scenario are based either on transferability, which is unreliable, or random sampling, which is often slow. Focusing on the latter, we propose to improve the efficiency of sampling-based attacks with prior beliefs about the target domain. We identify two such priors, image frequency and surrogate gradients, and discuss how to integrate them into a unified sampling procedure. We then formulate the Biased Boundary Attack, which achieves a drastic speedup over the original Boundary Attack. We demonstrate the effectiveness of our approach against an ImageNet classifier. We also showcase a targeted attack for the Google Cloud Vision API, where we craft convincing examples with just a few hundred queries. Finally, we demonstrate that our approach outperforms the state of the art when facing strong defenses: Our attack scored second place in the targeted attack track of the NeurIPS 2018 Adversarial Vision Challenge.
We study the problem of attacking a machine learning model in the hard-label black-box setting, where no model information is revealed except that the attacker can make queries to probe the corresponding hard-label decisions. This is a very challenging problem since the direct extension of state-of-the-art white-box attacks (e.g., CW or PGD) to the hard-label black-box setting will require minimizing a non-continuous step function, which is combinatorial and cannot be solved by a gradient-based optimizer. The only current approach is based on random walk on the boundary, which requires lots of queries and lacks convergence guarantees. We propose a novel way to formulate the hard-label black-box attack as a real-valued optimization problem which is usually continuous and can be solved by any zeroth order optimization algorithm. For example, using the Randomized Gradient-Free method, we are able to bound the number of iterations needed for our algorithm to achieve stationary points. We demonstrate that our proposed method outperforms the previous random walk approach to attacking convolutional neural networks on MNIST, CIFAR, and ImageNet datasets. More interestingly, we show that the proposed algorithm can also be used to attack other discrete and non-continuous machine learning models, such as Gradient Boosting Decision Trees (GBDT).