Entropy-Regularized Token-Level Policy Optimization for Large Language Models