The Partial Testimony of Logs: Evaluation of Language Model Generation under Confounded Model Choice

Open in new window