BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
–Neural Information Processing Systems
In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety alignment in large language models (LLMs).
Neural Information Processing Systems
Dec-25-2025, 03:31:54 GMT
- Technology: