Goto

Collaborating Authors

 poison


Poison Frogs! Targeted Clean-Label Poisoning Attacks on Neural Networks

Neural Information Processing Systems

Data poisoning is an attack on machine learning models wherein the attacker adds examples to the training set to manipulate the behavior of the model at test time. This paper explores poisoning attacks on neural nets. The proposed attacks use ``clean-labels''; they don't require the attacker to have any control over the labeling of training data. They are also targeted; they control the behavior of the classifier on a specific test instance without degrading overall classifier performance. For example, an attacker could add a seemingly innocuous image (that is properly labeled) to a training set for a face recognition engine, and control the identity of a chosen person at test time. Because the attacker does not need to control the labeling function, poisons could be entered into the training set simply by putting them online and waiting for them to be scraped by a data collection bot. We present an optimization-based method for crafting poisons, and show that just one single poison image can control classifier behavior when transfer learning is used. For full end-to-end training, we present a ``watermarking'' strategy that makes poisoning reliable using multiple (approx.


Community service

MIT Technology Review

The bird is a beautiful silver-gray, and as she dies twitching in the lasernet I'm grateful for two things: First, that she didn't make a sound. Second, that this will be the very last time. They're called corpse doves--because the darkest part of their gray plumage surrounds the lighter part, giving the impression that skeleton faces are peeking out from behind trash cans and bushes--and their crime is having the ability to carry diseases that would be compatible with humans. I open my hand, triggering the display from my imprinted handheld, and record an image to verify the elimination. A ding from my palm lets me know I've reached my quota for the day and, with that, the year. I'm tempted to give this one a send-off, a real burial with holy words and some flowers, but then I hear a pack of streetrats hooting beside me. My city-issued vest is reflective and nanopainted so it projects a slight glow. I don't know if it's to keep us safe like they say, or if it's just that so many of us are ex-cons working court-ordered labor, and civilians want to be able to keep an eye on us. Either way, everyone treats us like we're invisible--everyone except children.




Safety-Efficacy Trade Off: Robustness against Data-Poisoning

Granziol, Diego

arXiv.org Machine Learning

Backdoor and data poisoning attacks can achieve high attack success while evading existing spectral and optimisation based defences. We show that this behaviour is not incidental, but arises from a fundamental geometric mechanism in input space. Using kernel ridge regression as an exact model of wide neural networks, we prove that clustered dirty label poisons induce a rank one spike in the input Hessian whose magnitude scales quadratically with attack efficacy. Crucially, for nonlinear kernels we identify a near clone regime in which poison efficacy remains order one while the induced input curvature vanishes, making the attack provably spectrally undetectable. We further show that input gradient regularisation contracts poison aligned Fisher and Hessian eigenmodes under gradient flow, yielding an explicit and unavoidable safety efficacy trade off by reducing data fitting capacity. For exponential kernels, this defence admits a precise interpretation as an anisotropic high pass filter that increases the effective length scale and suppresses near clone poisons. Extensive experiments on linear models and deep convolutional networks across MNIST and CIFAR 10 and CIFAR 100 validate the theory, demonstrating consistent lags between attack success and spectral visibility, and showing that regularisation and data augmentation jointly suppress poisoning. Our results establish when backdoors are inherently invisible, and provide the first end to end characterisation of poisoning, detectability, and defence through input space curvature.


Manipulating SGD with Data Ordering Attacks

Neural Information Processing Systems

Machine learning is vulnerable to a wide variety of attacks. It is now well understood that by changing the underlying data distribution, an adversary can poison the model trained with it or introduce backdoors. In this paper we present a novel class of training-time attacks that require no changes to the underlying dataset or model architecture, but instead only change the order in which data are supplied to the model. In particular, we find that the attacker can either prevent the model from learning, or poison it to learn behaviours specified by the attacker. Furthermore, we find that even a single adversarially-ordered epoch can be enough to slow down model learning, or even to reset all of the learning progress. Indeed, the attacks presented here are not specific to the model or dataset, but rather target the stochastic nature of modern learning procedures. We extensively evaluate our attacks on computer vision and natural language benchmarks to find that the adversary can disrupt model training and even introduce backdoors.


Appendix A Poison crafting curves

Neural Information Processing Systems

Our poisons in the main paper were all crafted with 60 outer steps, also called craft steps. As a testbed, we consider poison frogs attacking a target airplane with a poison budget of 10%. The blue line in Figure 1 (top) shows the adversarial loss averaged over all the surrogate models during the crafting stage. It rapidly decreases up to craftstep 25 and then plateaus. It never sinks below zero, which means that inserting these poisons into a minibatch will not cause the model to misclassify the target two look-ahead SGD steps later, on average.