Estimating LLM Consistency: A User Baseline vs Surrogate Metrics