WhenLLMMeetsDRL: AdvancingJailbreaking EfficiencyviaDRL-guidedSearch
–Neural Information Processing Systems
These attacks either leverage in-contextlearning [6,35,66,28,5]orgenetic methods [65,27,32]. Specifically,in-contextlearning attacks keep querying another helper LLM togenerate and refine jailbreaking prompts. As shown in Section 4, purely relying on in-context learning has a limited ability tocontinuously refinetheprompts. Genetic method-based attacks design differentmutators that leverage the helper LLM to modify the jailbreaking prompts. They refine the prompts by iteratively selecting the promising prompts as the seeds for the next round.
Neural Information Processing Systems
Feb-10-2026, 14:12:09 GMT