EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning