No-Human in the Loop: Agentic Evaluation at Scale for Recommendation