JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Neural Information Processing Systems 

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found