Goto

Collaborating Authors

 adversarially


AUnified Game-Theoretic Interpretation of Adversarial Robustness: Supplementary Material

Neural Information Processing Systems

In this section, in order to help readers understand the metric in the paper, we first revisit the definition of the Shapley value [14], which is widely considered as an unbiased estimation of the numerical importance w.r.t. each input variable. In game theory, the complex system is usually represented as a game, where each input variable is taken as a player, and the output of this system is regarded as the total reward of all players. Given a game with multiple players (input variables) N = {1,2,,n}, some players cooperate to pursue a high reward. Thus, the task is to divide the total reward, and fairly assign the divided elementary reward to each individual player. In this way, the elementary reward can be considered as the numerical importance of the corresponding variable to the complex system. Let 2N def= {S|S N}indicate all potential subsets of N. The game v: 2N R is a function, which estimates the overall reward v(S) earned by each specific subset of players S N. In this way, the Shapley value, denoted by ฯ†(i), represents the numerical importance of the player ito the game v. ฯ†(i) = X Using Shapley values to explain DNNs.



Scanning Trojaned Models Using Out-of-Distribution Samples

Neural Information Processing Systems

Scanning for trojan (backdoor) in deep neural networks is crucial due to their significant real-world applications. There has been an increasing focus on developing effective general trojan scanning methods across various trojan attacks. Despite advancements, there remains a shortage of methods that perform effectively without preconceived assumptions about the backdoor attack method. Additionally, we have observed that current methods struggle to identify classifiers trojaned using adversarial training. Motivated by these challenges, our study introduces a novel scanning method named TRODO (TROjan scanning by Detection of adversarial shifts in Out-of-distribution samples).


Tolerant Algorithms for Learning with Arbitrary Covariate Shift

Neural Information Processing Systems

We study the problem of learning under arbitrary distribution shift, where the learner is trained on a labeled set from one distribution but evaluated on a different, potentially adversarially generated test distribution. We focus on two frameworks: [GKKM'20], allowing abstention on adversarially generated parts of the test distribution, and [KSV'23], permitting abstention on the entire test distribution if distribution shift is detected. All prior known algorithms either rely on learning primitives that are computationally hard even for simple function classes, or end up abstaining entirely even in the presence of a tiny amount of distribution shift. We address both these challenges for natural function classes, including intersections of halfspaces and decision trees, and standard training distributions, including Gaussians. For PQ learning, we give efficient learning algorithms, while for TDS learning, our algorithms can tolerate moderate amounts of distribution shift. At the core of our approach is an improved analysis of spectral outlier-removal techniques from learning with nasty noise. Our analysis can (1) handle arbitrarily large fraction of outliers, which is crucial for handling arbitrary distribution shifts, and (2) obtain stronger bounds on polynomial moments of the distribution after outlier removal, yielding new insights into polynomial regression under distribution shifts. Lastly, our techniques lead to novel results for tolerant [RV'23], and learning with nasty noise.




Appendices

Neural Information Processing Systems

The supplementary material is organized as follows. We first discuss additional related work and provide experiment details inSection 2andAppendix Brespectively. Adversarial Defenses: Neural networks trained using standard procedures such as SGD are extremely vulnerable [23] to -bound adversarial attacks such as FGSM [23], PGD [42], CW [11], andMomentum [17];Unrestricted attacks [7,19]cansignificantly degrade model performance as well. Defense strategies based on heuristics such as feature squeezing [82], denoising [80], encoding [10], specialized nonlinearities [83] and distillation [56] have had limited success against stronger attacks [2]. Then, we introduce a noisy version of the5-slab block,whichwelateruseinAppendixD.