MARSHAL: Incentivizing Multi-Agent Reasoning via Self-Play with Strategic LLMs

Open in new window