WhenLLMMeetsDRL: AdvancingJailbreaking EfficiencyviaDRL-guidedSearch

Feb-10-2026, 14:12:09 GMT–Neural Information Processing Systems

These attacks either leverage in-contextlearning [6,35,66,28,5]orgenetic methods [65,27,32]. Specifically,in-contextlearning attacks keep querying another helper LLM togenerate and refine jailbreaking prompts. As shown in Section 4, purely relying on in-context learning has a limited ability tocontinuously refinetheprompts. Genetic method-based attacks design differentmutators that leverage the helper LLM to modify the jailbreaking prompts. They refine the prompts by iteratively selecting the promising prompts as the seeds for the next round.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Feb-10-2026, 14:12:09 GMT

Conferences PDF

Add feedback

Industry:
- Information Technology > Security & Privacy (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.48)

Duplicate Docs Excel Report

Title
When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search Xuan Chen

Similar Docs Excel Report more

Title	Similarity	Source
None found