EPO: Explicit Policy Optimization for Strategic Reasoning in LLMs via Reinforcement Learning

Open in new window