AITopics | rema

Collaborating Authors

rema

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ReMA: Learning to Meta-think for LLMs with Multi-agent Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 07:17:08 GMT

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking--enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit metathinking behaviors, encouraging LLMs to think about thinking.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

Neural Information Processing Systems

Country: Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ReMA: Learning to Meta-Think for LLMs with Multi-agent Reinforcement Learning

Neural Information Processing SystemsJun-14-2026, 00:37:51 GMT

Recent research on Reasoning of Large Language Models (LLMs) has sought to further enhance their performance by integrating meta-thinking--enabling models to monitor, evaluate, and control their reasoning processes for more adaptive and effective problem-solving. However, current single-agent work lacks a specialized design for acquiring meta-thinking, resulting in low efficacy. To address this challenge, we introduce Reinforced Meta-thinking Agents (ReMA), a novel framework that leverages Multi-Agent Reinforcement Learning (MARL) to elicit meta-thinking behaviors, encouraging LLMs to think about thinking.

large language model, machine learning, reinforcement learning, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Reconstruct and Match: Out-of-Distribution Robustness via Topological Homogeneity Chaoqi Chen 1 Luyao T ang 2 Hui Huang 1 1 College of Computer Science and Software Engineering, Shenzhen University

Neural Information Processing SystemsNov-20-2025, 05:44:12 GMT

Since deep learning models are usually deployed in non-stationary environments, it is imperative to improve their robustness to out-of-distribution (OOD) data.

artificial intelligence, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.40)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

Unlocking the Power of Multi-Agent LLM for Reasoning: From Lazy Agents to Deliberation

Zhang, Zhiwei, Li, Xiaomin, Lin, Yudi, Liu, Hui, Chandradevan, Ramraj, Wu, Linlin, Lin, Minhua, Wang, Fali, Tang, Xianfeng, He, Qi, Wang, Suhang

arXiv.org Artificial IntelligenceNov-5-2025

Large Language Models (LLMs) trained with reinforcement learning and verifiable rewards have achieved strong results on complex reasoning tasks. Recent work extends this paradigm to a multi-agent setting, where a meta-thinking agent proposes plans and monitors progress while a reasoning agent executes subtasks through sequential conversational turns. Despite promising performance, we identify a critical limitation: lazy agent behavior, in which one agent dominates while the other contributes little, undermining collaboration and collapsing the setup to an ineffective single agent. In this paper, we first provide a theoretical analysis showing why lazy behavior naturally arises in multi-agent reasoning. We then introduce a stable and efficient method for measuring causal influence, helping mitigate this issue. Finally, as collaboration intensifies, the reasoning agent risks getting lost in multi-turn interactions and trapped by previous noisy responses. To counter this, we propose a verifiable reward mechanism that encourages deliberation by allowing the reasoning agent to discard noisy outputs, consolidate instructions, and restart its reasoning process when necessary. Extensive experiments demonstrate that our framework alleviates lazy agent behavior and unlocks the full potential of multi-agent framework for complex reasoning tasks. Techniques such as chain-of-thought prompting (Wei et al., 2022; Kojima et al., 2022) and structured methods like Tree-of-Thoughts and Graph-of-Thoughts (Y ao et al., 2023; Besta et al., 2024) expand the space for deliberation. More recently, multi-agent frameworks enable LLMs with specialized roles to collaborate via planning, delegation, and debate, echoing human team dynamics (Li et al., 2023; Wu et al., 2024a; Chen et al., 2023; Du et al., 2023; Y uan & Xie). To support multi-agent and multi-turn reinforcement learning, multi-turn Group Relative Preference Optimization (GRPO) (Wan et al., 2025; Shi et al., 2025; Wei et al., 2025) and its variants (Guo et al., 2025b; Zhang et al., 2025c; Ning et al., 2025; Xue et al., 2025) compute advantages and importance ratios at the turn level, enabling finer-grained optimization and more precise credit assignment. Building on this foundation, ReMA (Wan et al., 2025) introduces a multi-agent LLM reasoning framework with two specialized roles: a meta-thinking agent, which decomposes tasks, sets intermediate goals, and adapts based on feedback, and a reasoning agent, which performs step-by-step 1 The agents alternate sequentially, but since only a final outcome reward is available, ReMA computes a group advantage following GRPO (Shao et al., 2024) and uniformly assigns this trajectory-level signal to every turn in the rollout.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2511.02303

Country:

North America > United States > Utah (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre:

Workflow (0.66)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Reconstruct and Match: Out-of-Distribution Robustness via Topological Homogeneity Chaoqi Chen 1 Luyao T ang 2 Hui Huang 1 1 College of Computer Science and Software Engineering, Shenzhen University

Neural Information Processing SystemsOct-10-2025, 19:30:31 GMT

Since deep learning models are usually deployed in non-stationary environments, it is imperative to improve their robustness to out-of-distribution (OOD) data.

adaptation, domain generalization, generalization, (17 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Shenzhen (0.40)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

PREMA: Principled Tensor Data Recovery from Multiple Aggregated Views

Almutairi, Faisal M., Kanatsoulis, Charilaos I., Sidiropoulos, Nicholas D.

arXiv.org Machine LearningOct-26-2019

Multidimensional data have become ubiquitous and are frequently involved in situations where the information is aggregated over multiple data atoms. The aggregation can be over time or other features, such as geographical location or group affiliation. We often have access to multiple aggregated views of the same data, each aggregated in one or more dimensions, especially when data are collected or measured by different agencies. However, data mining and machine learning models require detailed data for personalized analysis and prediction. Thus, data disaggregation algorithms are becoming increasingly important in various domains. The goal of this paper is to reconstruct finer-scale data from multiple coarse views, aggregated over different (subsets of) dimensions. The proposed method, called PREMA, leverages low-rank tensor factorization tools to provide recovery guarantees under certain conditions. PREMA is flexible in the sense that it can perform disaggregation on data that have missing entries, i.e., partially observed. The proposed method considers challenging scenarios: i) the available views of the data are aggregated in two dimensions, i.e., double aggregation, and ii) the aggregation patterns are unknown. Experiments on real data from different domains, i.e., sales data from retail companies, crime counts, and weather observations, are presented to showcase the effectiveness of PREMA.

aggregation, rema, tensor, (17 more...)

arXiv.org Machine Learning

1910.12001

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
Oceania > Australia (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Retail (1.00)
Health & Medicine (0.93)
Information Technology > Security & Privacy (0.40)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback