BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset

Open in new window