Exact Adversarial Attack to Image Captioning via Structured Output Learning with Latent Variables