Deep Reinforcement Learning for Chatbots Using Clustered Actions and Human-Likeness Rewards