Efficient Evaluation of LLM Performance with Statistical Guarantees