Automatic Hallucination Assessment for Aligned Large Language Models via Transferable Adversarial Attacks

Open in new window