from Benign import Toxic: Jailbreaking the Language Model via Adversarial Metaphors

Open in new window