Introduction to Adversarial Machine Learning
Here we are in 2019, where we keep seeing State-Of-The-Art (from now on SOTA) classifiers getting published every day; some are proposing entire new architectures, some are proposing tweaks that are needed to train a classifier more accurately. To keep things simple, let's talk about simple image classifiers, which have come a long way from GoogleLeNet to AmoebaNet-A, giving 83% (top-1) accuracy on ImageNet. If we were to take an image and change a few pixels on it (not randomly), what looks the same to the human eye can cause the SOTA classifiers to fail miserably! I have a few benchmarks here. You can see how miserably these classifiers fail even with the simplest perturbations. This is an alarming situation in the Machine Learning community, especially as we move closer and closer to adopt the use of these SOTA models in real world applications. Let's discuss a few real-life examples to help understand the seriousness of the situation. Tesla has come a long way, and many self-driving car companies are trying to keep pace with them. Recently, however, it was seen that SOTA models used by Tesla can be fooled by putting simple stickers (adversarial patches) on the road, which the car interprets as the lane diverging, causing it to drive into oncoming traffic. The severity of this situation is very much underestimated even by Elon (CEO of Tesla) himself, while I believe Andrej Karpathy (Head of AI, Tesla) is quite aware of how dangerous the situation is. This thread from Jeremy (Co-Founder of Fast.ai) says it all. In this clip, @elonmusk tells @lexfridman that adversarial examples are trivially easily fixed.@karpathy is that your experience at @tesla? @catherineols is that what the neurips adversarial challenge found? A recently released paper showed that a stop sign manipulated with adversarial patches caused the SOTA model to begin "thinking" that it was a speed limit sign. This sounds scary, doesn't it? Not to mention that these attacks can be used to make the networks predict whatever the attackers want! Imagine an attacker who manipulates road signs in a way such that self-driving cars will break traffic rules.
Nov-3-2019, 22:12:57 GMT