Reward-Punishment Reinforcement Learning with Maximum Entropy