Bayesian Evaluation of Large Language Model Behavior

Open in new window