Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks

Open in new window