Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

Open in new window