Catwalk: A Unified Language Model Evaluation Framework for Many Datasets

Open in new window