JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
–Neural Information Processing Systems
Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.
Neural Information Processing Systems
Oct-10-2025, 04:29:10 GMT
- Country:
- Genre:
- Research Report > New Finding (0.46)
- Industry:
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Government (1.00)
- Health & Medicine (0.68)
- Technology: