to

### Hard Label Black-box Adversarial Attacks in Low Query Budget Regimes

We focus on the problem of black-box adversarial attacks, where the aim is to generate adversarial examples for deep learning models solely based on information limited to output labels (hard label) to a queried data input. We use Bayesian optimization (BO) to specifically cater to scenarios involving low query budgets to develop efficient adversarial attacks. Issues with BO's performance in high dimensions are avoided by searching for adversarial examples in structured low-dimensional subspace. Our proposed approach achieves better performance to state of the art black-box adversarial attacks that require orders of magnitude more queries than ours.

### RayS: A Ray Searching Method for Hard-label Adversarial Attack

Deep neural networks are vulnerable to adversarial attacks. Among different attack settings, the most challenging yet the most practical one is the hard-label setting where the attacker only has access to the hard-label output (prediction label) of the target model. Previous attempts are neither effective enough in terms of attack success rate nor efficient enough in terms of query complexity under the widely used $L_\infty$ norm threat model. In this paper, we present the Ray Searching attack (RayS), which greatly improves the hard-label attack effectiveness as well as efficiency. Unlike previous works, we reformulate the continuous problem of finding the closest decision boundary into a discrete problem that does not require any zeroth-order gradient estimation. In the meantime, all unnecessary searches are eliminated via a fast check step. This significantly reduces the number of queries needed for our hard-label attack. Moreover, interestingly, we found that the proposed RayS attack can also be used as a sanity check for possible "falsely robust" models. On several recently proposed defenses that claim to achieve the state-of-the-art robust accuracy, our attack method demonstrates that the current white-box/black-box attacks could still give a false sense of security and the robust accuracy drop between the most popular PGD attack and RayS attack could be as large as $28\%$. We believe that our proposed RayS attack could help identify falsely robust models that beat most white-box/black-box attacks.

### Spanning Attack: Reinforce Black-box Attacks with Unlabeled Data

It has been shown that machine learning models, especially deep neural networks, are vulnerable to small adversarial perturbations, i.e., a small carefully crafted perturbation added to the input may significantly change the prediction results (Szegedy et al., 2014; Goodfellow et al., 2015; Biggio and Roli, 2018; Fawzi et al., 2018). Therefore, the problem of finding those perturbations, also known as adversarial attacks, has become an important way to evaluate the model robustness: the more difficult to attack a given model, the more robust it is. Depending on the information an adversary can access, the adversarial attacks can be classified into white-box and black-box settings. In the white-box setting, the target model is completely exposed to the attacker, and adversarial perturbations could be easily crafted by exploiting the first-order information, i.e., gradients with respect to the input (Carlini and Wagner, 2017; Madry et al., 2018). Despite of its efficiency and effectiveness, the white-box setting is an overly strong and pessimistic threat model, and white-box attacks are usually not practical when attacking real-world machine learning systems due to the invisibility of the gradient information. Instead, we focus on the problem of black-box attacks, where the model structure and parameters (weights) are not available to the attacker.