APRIL: Annotations for Policy evaluation with Reliable Inference from LLMs