OptimizingRelevanceMapsofVision TransformersImprovesRobustness

Neural Information Processing Systems 

It has been observed that visual classification models often rely mostly on spurious cues such as the image background, which hurts their robustness to distribution changes. To alleviate this shortcoming, we propose to monitor the model's relevancy signal and direct the model to base its prediction on the foregroundobject.