Understanding Jailbreak Success: A Study of Latent Space Dynamics in Large Language Models

Open in new window