SafetyAnalyst: Interpretable, transparent, and steerable LLM safety moderation

Open in new window