Towards Comprehensive and Efficient Post Safety Alignment of Large Language Models via Safety Patching

Open in new window