Measuring Bias or Measuring the Task: Understanding the Brittle Nature of LLM Gender Biases