Balancing Two-Player Stochastic Games with Soft Q-Learning