PentestJudge: Judging Agent Behavior Against Operational Requirements