Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations