MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs