One-Shot Safety Alignment for Large Language Models via Optimal Dualization

Open in new window