Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Neural Information Processing Systems 

To address these issues, we introduced JailTrickBench to evaluate the impact of various attack settings on LLM performance and provide a baseline for jailbreak attacks, encouraging the adoption of a standardized evaluation framework.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found