What the HellaSwag? On the Validity of Common-Sense Reasoning Benchmarks

Open in new window