An Interpretable and Scalable Framework for Evaluating Large Language Models

Open in new window