Don't Pass$\mathtt{@}k$: A Bayesian Framework for Large Language Model Evaluation

Open in new window