Defending Object Detectors against Patch Attacks with Out-of-Distribution Smoothing

Feng, Ryan, Mangaokar, Neal, Choi, Jihye, Jha, Somesh, Prakash, Atul

arXiv.org Artificial Intelligence 

Machine learning models today remain vulnerable to adversarial examples [11, 27, 1, 2, 9, 10, 29], where perturbed inputs lead to unexpected model outputs. Such adversarial examples take a variety of forms, including digital attacks [11, 27] and physical [9, 2, 10] attacks, where the attack can be physically-realized in the real-world in the form of printed stickers [9, 10] or 3D objects [2]. Thus, the patch attack has been of increasing interest due to its ability to practically inject an attack via the insertion of a printed physical patch into the scene. A variety of patch attacks defenses have thus been proposed, including several certified [15, 5, 33, 32, 34, 19] and empirical [35, 20, 16, 37, 28, 4] defenses, with many of these defenses designed around the operation of identifying and then removing the patch. Such defenses rely on being able to accurately identify the patch attack without false positives and remove the effects of identified patches with a variety of techniques, including blacking them out [20] or setting it to the image's mean color [35]. Our first key contribution is that we unify these types of defenses under a general framework called OODSmoother (Section 3), as shown in Figure 1.