Mohammad Mahmoody
Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution
Dimitrios Diochnos, Saeed Mahloujifar, Mohammad Mahmoody
As the current literature contains multiple definitions of adversarial risk and robustness, we start by giving a taxonomy for these definitions based on their direct goals; we identify one of them as the one guaranteeing misclassification by pushing the instances to the error region. We then study some classic algorithms for learning monotone conjunctions and compare their adversarial robustness under different definitions by attacking the hypotheses using instances drawn from the uniform distribution. We observe that sometimes these definitions lead to significantly different bounds. Thus, this study advocates for the use of the error-region definition, even though other definitions, in other contexts with context-dependent assumptions, may coincide with the error-region definition.
Adversarial Risk and Robustness: General Definitions and Implications for the Uniform Distribution
Dimitrios Diochnos, Saeed Mahloujifar, Mohammad Mahmoody
As the current literature contains multiple definitions of adversarial risk and robustness, we start by giving a taxonomy for these definitions based on their direct goals; we identify one of them as the one guaranteeing misclassification by pushing the instances to the error region. We then study some classic algorithms for learning monotone conjunctions and compare their adversarial robustness under different definitions by attacking the hypotheses using instances drawn from the uniform distribution. We observe that sometimes these definitions lead to significantly different bounds. Thus, this study advocates for the use of the error-region definition, even though other definitions, in other contexts with context-dependent assumptions, may coincide with the error-region definition.
Empirically Measuring Concentration: Fundamental Limits on Intrinsic Robustness
Saeed Mahloujifar, Xiao Zhang, Mohammad Mahmoody, David Evans
Many recent works have shown that adversarial examples that fool classifiers can be found by minimally perturbing a normal input. Recent theoretical results, starting with Gilmer et al. (2018b), show that if the inputs are drawn from a concentrated metric probability space, then adversarial examples with small perturbation are inevitable. A concentrated space has the property that any subset with Ω(1) (e.g., 1/100) measure, according to the imposed distribution, has small distance to almost all (e.g., 99/100) of the points in the space. It is not clear, however, whether these theoretical results apply to actual distributions such as images. This paper presents a method for empirically measuring and bounding the concentration of a concrete dataset which is proven to converge to the actual concentration.