ReMA: Learning to Meta-think for LLMs with Multi-Agent Reinforcement Learning