The Case for Repeatable, Open, and Expert-Grounded Hallucination Benchmarks in Large Language Models

Open in new window