PersonaEval: Are LLM Evaluators Human Enough to Judge Role-Play?