Undesirable Biases in NLP: Addressing Challenges of Measurement