From Ground Truth to Measurement: A Statistical Framework for Human Labeling