Evaluating AI Evaluation: Perils and Prospects