Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models

Open in new window