Bayesian Calibration of Win Rate Estimation with LLM Evaluators