Goto

Collaborating Authors

 Law




JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models

Neural Information Processing Systems

Jailbreak attacks cause large language models (LLMs) to generate harmful, unethical, or otherwise objectionable content. Evaluating these attacks presents a number of challenges, which the current collection of benchmarks and evaluation techniques do not adequately address.





Supplementary Material for " AllClear: A Comprehensive Dataset and Benchmark for Cloud Removal in Satellite Imagery "

Neural Information Processing Systems

In Sec. 2 we include a We include a datasheet for our dataset following the methodology from "Datasheets for Datasets" Ge-17 In this section, we include the prompts from Gebru et al. [2021] in blue, and in For what purpose was the dataset created? Was there a specific task in mind? The dataset was created to facilitate research development on cloud removal in satellite imagery. Specifically, our task is more temporally aligned than previous benchmarks. Who created the dataset (e.g., which team, research group) and on behalf of which entity (e.g., Who funded the creation of the dataset?



Appendix

Neural Information Processing Systems

Moreover, there remains a considerable gap between the ability to answer exam questions and the application of this knowledge in real-world situations. To bridge the gap and thoroughly assess LLMs in supporting the crop science field, we introduce CROP.