Quantifying Ambiguity in Categorical Annotations: A Measure and Statistical Inference Framework