Defending Against Alignment-Breaking Attacks via Robustly Aligned LLM

Open in new window