Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou
–Neural Information Processing Systems
Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior.
Neural Information Processing Systems
Feb-12-2026, 06:45:59 GMT
- Country:
- North America > United States
- Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Media > News (0.93)
- Information Technology > Security & Privacy (0.88)
- Government (0.88)
- Health & Medicine > Therapeutic Area
- Technology: