An Approach to Grounding AI Model Evaluations in Human-derived Criteria