Weak-to-Strong Jailbreaking on Large Language Models

Open in new window