All Languages Matter: On the Multilingual Safety of Large Language Models
Wang, Wenxuan, Tu, Zhaopeng, Chen, Chang, Yuan, Youliang, Huang, Jen-tse, Jiao, Wenxiang, Lyu, Michael R.
–arXiv.org Artificial Intelligence
Safety lies at the core of developing and deploying large language models (LLMs). Experimental results show that all LLMs produce significantly more unsafe responses for non-English queries than English ones, indicating the necessity of developing safety alignment for non-English languages. In addition, we propose several simple and effective prompting methods to improve the multilingual safety of ChatGPT by evoking safety knowledge and improving cross-lingual generalization of safety alignment. Our prompting method can significantly reduce the ratio of unsafe responses from 19.1% to 9.7% for non-English queries Recent advances in scaling large language models (LLMs) have made breakthroughs in the Artificial Intelligence (AI) area. With the rapid increase of model parameters and training data, LLMs have gained emergent abilities in various tasks, including writing assistance Gao et al. (2022), code generation Gao et al. (2023), machine translation Jiao et al. (2023), and so on. Due to their impressive performance, a number of LLMs have been launched by commercial companies and academic institutions, including OpenAI's GPT models Brown et al. (2020); OpenAI (2022), Google's Bard Pichai (2023), and Meta's LLaMA Touvron et al. (2023a;b). Such extensive deployment underscores an imperative of paramount significance: ensuring the safety of LLMs. There has been a number of work for aligning LLMs with human ethics and preferences to improve their safety, including data filtering (Xu et al., 2020; Welbl et al., 2021; Wang et al., 2022), supervised fine-tuning (Ouyang et al., 2022), reinforcement learning from human feedback (RLHF) (Christiano et al., 2017), and red teaming (Perez et al., 2022; Ganguli et al., 2022a). Most of the existing work on safety alignment has focused on the interaction in English OpenAI (2023). However, as globally deployed services, LLMs, such as ChatGPT, have users around the world and are frequently engaged in non-English communication with users from non-English-speaking regions.
arXiv.org Artificial Intelligence
Oct-2-2023
- Country:
- Asia > China > Guangdong Province (0.14)
- Genre:
- Research Report > New Finding (0.48)
- Industry:
- Technology: