Playing 20 Question Game with Policy-Based Reinforcement Learning