An Alternative Softmax Operator for Reinforcement Learning