Why Don't Prompt-Based Fairness Metrics Correlate?

Open in new window