BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Neural Information Processing Systems 

In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs).