Learning Nash Equilibria in Zero-Sum Stochastic Games via Entropy-Regularized Policy Approximation