Establishing Construct Validity in LLM Capability Benchmarks Requires Nomological Networks

Open in new window