Towards Reliable and Practical LLM Security Evaluations via Bayesian Modelling