Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou

Neural Information Processing Systems 

Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found