LongSafetyBench: Long-Context LLMs Struggle with Safety Issues

Huang, Mianqiu, Liu, Xiaoran, Zhou, Shaojun, Zhang, Mozhi, Tan, Chenkun, Wang, Pengyu, Guo, Qipeng, Xu, Zhe, Li, Linyang, Lei, Zhikai, Li, Linlin, Liu, Qun, Zhou, Yaqian, Qiu, Xipeng, Huang, Xuanjing

Nov-11-2024–arXiv.org Artificial Intelligence

WARNING: This paper contains unsafe content. With the development of large language models (LLMs), the sequence length of these models continues to increase, drawing significant attention to long-context language models. However, the evaluation of these models has been primarily limited to their capabilities, with a lack of research focusing on their safety. Existing work, such as ManyShotJailbreak, has to some extent demonstrated that longcontext language models can exhibit safety concerns. However, the methods used are limited and lack comprehensiveness. In response, we introduce LongSafety-Bench, the first benchmark designed to objectively and comprehensively evaluate the safety of long-context models. LongSafetyBench consists of 10 task categories, with an average length of 41,889 words. After testing eight long-context language models on LongSafetyBench, we found that existing models generally exhibit insufficient safety capabilities. The proportion of safe responses from most mainstream long-context LLMs is below 50%. Moreover, models' safety performance in long-context scenarios does not always align with that in short-context scenarios. Further investigation revealed that long-context models tend to overlook harmful content within lengthy texts. We also proposed a simple yet effective solution, allowing open-source models to achieve performance comparable to that of top-tier closed-source models. We believe that LongSafetyBench can serve as a valuable benchmark for evaluating the safety capabilities of long-context language models. We hope that our work will encourage the broader community to pay attention to the safety of long-context models and contribute to the development of solutions to improve the safety of long-context LLMs. Recently, thanks to more advanced model architectures (Xiao et al., 2024b;a; Liu et al., 2024a) and expanded position encoding techniques (Su et al., 2023; Liu et al., 2024b), the context length of language models has been extended significantly (Achiam et al., 2023; Reid et al., 2024). In the foreseeable future, as language models continue to evolve and tackle increasingly complex problems, the demand for handling longer contexts is expected to grow accordingly. We anticipate that long-context language models will become mainstream. Previous research on long-context language models, such as LongBench (Bai et al., 2024), L-Eval (An et al., 2023), and RULER (Hsieh et al., 2024), has typically focused on their capabilities, while neglecting to address their safety. In short-context scenarios, the safety issues of language models have already been extensively studied.(Zhang Illegal Activities, Misinformation Harm, Offensiveness The question is composed of a long content and Bias.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

Nov-11-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.68)

Genre:
- Research Report (0.51)

Industry:
- Energy > Power Industry
  - Utilities > Nuclear (0.46)
- Government
  - Immigration & Customs (0.93)
  - Regional Government (1.00)
- Law (0.88)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language > Large Language Model (1.00)