SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models

Open in new window