Why Don't Prompt-Based Fairness Metrics Correlate?