ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Open in new window