ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning

Open in new window