Guarding the Meaning: Self-Supervised Training for Semantic Robustness in Guard Models

Open in new window