ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Jun-14-2026, 00:37:51 GMT–Neural Information Processing Systems

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking--enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking.

large language model, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Jun-14-2026, 00:37:51 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (0.91)
  - Machine Learning > Reinforcement Learning (0.68)