Defending Against Unforeseen Failure Modes with Latent Adversarial Training

Open in new window