Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou

Feb-12-2026, 06:45:59 GMT–Neural Information Processing Systems

Despite advances in AI alignment, large language models (LLMs) remain vulnerable to adversarial attacks or jailbreaking, in which adversaries can modify prompts to induce unwanted behavior.

large language model, machine learning, natural language, (22 more...)

Neural Information Processing Systems

Feb-12-2026, 06:45:59 GMT

Conferences PDF

Country:
- North America > United States
  - Illinois > Champaign County > Urbana (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > Experimental Study (0.93)

Industry:
- Media > News (0.93)
- Information Technology > Security & Privacy (0.88)
- Government (0.88)
- Health & Medicine > Therapeutic Area
  - Infections and Infectious Diseases (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.96)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
46ed503889ab232c21c1162340ee17b2-Paper-Conference.pdf

Similar Docs Excel Report more

Title	Similarity	Source
None found