Predicting the Performance of Black-box Language Models with Follow-up Queries

Open in new window