Zero-shot Benchmarking: A Framework for Flexible and Scalable Automatic Evaluation of Language Models

Open in new window