When LLM Meets DRL: Advancing Jailbreaking Efficiency via DRL-guided Search Xuan Chen 1
–Neural Information Processing Systems
Warning: This paper contains unfiltered and potentially harmful content. Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to "fool" LLMs into responding to harmful questions. Early-stage jailbreaking attacks require access to model internals or significant human efforts. More advanced attacks utilize genetic algorithms for automatic and black-box attacks. However, the random nature of genetic algorithms significantly limits the effectiveness of these attacks.
Neural Information Processing Systems
May-28-2025, 23:39:14 GMT
- Country:
- North America > United States > California (0.14)
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (1.00)
- Research Report
- Industry:
- Government (0.67)
- Information Technology > Security & Privacy (1.00)
- Law (0.67)
- Technology: