Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou
–Neural Information Processing Systems
Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior.
Neural Information Processing Systems
Feb-12-2026, 06:45:59 GMT
- Country:
- Asia > Middle East
- Jordan (0.04)
- North America > United States
- Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East
- Genre:
- Research Report > Experimental Study (0.93)
- Industry:
- Government (0.88)
- Health & Medicine > Therapeutic Area
- Information Technology > Security & Privacy (0.88)
- Media > News (0.93)
- Technology: