Spectral Signatures in Backdoor Attacks

Tran, Brandon, Li, Jerry, Madry, Aleksander

Neural Information Processing Systems 

A recent line of work has uncovered a new form of data poisoning: so-called backdoor attacks. These attacks are particularly dangerous because they do not affect a network's behavior on typical, benign data. Rather, the network only deviates from its expected output when triggered by an adversary's planted perturbation. In this paper, we identify a new property of all known backdoor attacks, which we call spectral signatures. This property allows us to utilize tools from robust statistics to thwart the attacks.