"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models