Adversarial examples are inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake; they're like optical illusions for machines. In this post we'll show how adversarial examples work across different mediums, and will discuss why securing systems against them can be difficult. At OpenAI, we think adversarial examples are a good aspect of security to work on because they represent a concrete problem in AI safety that can be addressed in the short term, and because fixing them is difficult enough that it requires a serious research effort. To get an idea of what adversarial examples look like, consider this demonstration from Explaining and Harnessing Adversarial Examples: starting with an image of a panda, the attacker adds a small perturbation that has been calculated to make the image be recognized as a gibbon with high confidence. An adversarial input, overlaid on a typical image, can cause a classifier to miscategorize a panda as a gibbon.
Well, it might happen someday, and not in the way you may think. Of course neural networks could be trained to pilot drones or operate other weapons of mass destruction, but even an innocuous (and presently available) network trained to drive a car could be turned to act against its owner. This is because neural networks are extremely susceptible to something called adversarial examples.
With the advent of the era of artificial intelligence(AI), deep neural networks (DNNs) have shown huge superiority over human in image recognition, speech processing, autonomous vehicles and medical diagnosis. However, recent studies indicate that DNNs are vulnerable to adversarial examples (AEs) which are designed by attackers to fool deep learning models. Different from real examples, AEs can hardly be distinguished from human eyes, but mislead the model to predict incorrect outputs and therefore threaten security critical deep-learning applications. In recent years, the generation and defense of AEs have become a research hotspot in the field of AI security. This article reviews the latest research progress of AEs. First, we introduce the concept, cause, characteristic and evaluation metrics of AEs, then give a survey on the state-of-the-art AE generation methods with the discussion of advantages and disadvantages. After that we review the existing defenses and discuss their limitations. Finally, the future research opportunities and challenges of AEs are prospected.
This article is part of our reviews of AI research papers, a series of posts that explore the latest findings in artificial intelligence. We usually don't expect the image of a teacup to turn into a cat when we zoom out. But in the world of artificial intelligence research, strange things can happen. Researchers at Germany's Technische Universität Braunschweig have shown that carefully modifying the pixel values of digital photos can turn them into a completely different image when they are downscaled. What's concerning is the implications these modifications can have for AI algorithms.
In the last post, we discussed an outline of AI powered cyber attacks and their defence strategies. In this post, we will discuss a specific type of attack which is called adversarial attack. Adversarial attacks are not common now because there are not many deep learning systems in production. But soon, we expect that they will increase. Adversarial attacks are easy to describe.