Guess What I am Thinking: A Benchmark for Inner Thought Reasoning of Role-Playing Language Agents