GPT is Not an Annotator: The Necessity of Human Annotation in Fairness Benchmark Construction