When Maximum Entropy Misleads Policy Optimization