AITopics

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Germany > Hamburg (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.78)

Neural Information Processing SystemsDec-24-2025, 09:01:57 GMT

Multi-agent active perception with prediction rewards

Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable. The task is decentralized and the joint estimate can only be computed after the task ends by fusing observations of all agents. The objective is to maximize the accuracy of the estimate. The accuracy is quantified by a centralized prediction reward determined by a centralized decision-maker who perceives the observations gathered by all agents after the task ends. In this paper, we model multi-agent active perception as a decentralized partially observable Markov decision process (Dec-POMDP) with a convex centralized prediction reward. We prove that by introducing individual prediction actions for each agent, the problem is converted into a standard Dec-POMDP with a decentralized prediction reward. The loss due to decentralization is bounded, and we give a sufficient condition for when it is zero. Our results allow application of any Dec-POMDP solution algorithm to multi-agent active perception problems, and enable planning to reduce uncertainty without explicit computation of joint estimates. We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.

active perception, multi-agent active perception, prediction reward, (10 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceDec-10-2025

LLM Collaboration With Multi-Agent Reinforcement Learning

Liu, Shuo, Chen, Tianle, Liang, Zeyu, Lyu, Xueguang, Amato, Christopher

A large amount of work has been done in Multi-Agent Systems (MAS) for modeling and solving problems with multiple interacting agents. However, most LLMs are pretrained independently and not specifically optimized for coordination. For example, existing LLM fine-tuning frameworks rely on individual rewards, which require complex reward designs for each agent to encourage collaboration. To address this challenge, we model LLM collaboration as a cooperative Multi-Agent Reinforcement Learning (MARL) problem. We develop a multi-agent, multi-turn algorithm, Multi-Agent Group Relative Policy Optimization (MAGRPO), to solve it, building on current RL approaches for LLMs as well as MARL techniques. Our experiments on LLM writing and coding collaboration demonstrate that fine-tuning multiple LLMs with MAGRPO enables agents to generate high-quality responses efficiently through effective cooperation. Our approach opens the door to using MARL methods for LLM collaboration and highlights the associated challenges.

large language model, machine learning, natural language, (17 more...)

2508.04652

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)

Genre: Research Report (1.00)

Industry: Transportation (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.47)

Forkel, Johannes, Foerster, Jakob

Entropy is all you need for Inter-Seed Cross-Play in Hanabi

arXiv.org Artificial IntelligenceDec-1-2025

We find that in Hanabi, one of the most complex and popular benchmarks for zero-shot coordination and ad-hoc teamplay, a standard implementation of independent PPO with a slightly higher entropy coefficient 0.05 instead of the typically used 0.01, achieves a new state-of-the-art in cross-play between different seeds, beating by a significant margin all previous specialized algorithms, which were specifically designed for this setting. We provide an intuition for why sufficiently high entropy regularization ensures that different random seed produce joint policies which are mutually compatible. We also empirically find that a high $λ_{\text{GAE}}$ around 0.9, and using RNNs instead of just feed-forward layers in the actor-critic architecture, strongly increase inter-seed cross-play. While these results demonstrate the dramatic effect that hyperparameters can have not just on self-play scores but also on cross-play scores, we show that there are simple Dec-POMDPs though, in which standard policy gradient methods with increased entropy regularization are not able to achieve perfect inter-seed cross-play, thus demonstrating the continuing necessity for new algorithms for zero-shot coordination.

large language model, machine learning, natural language, (20 more...)

2511.22581

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.69)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.46)

Neural Information Processing SystemsSep-30-2025, 12:48:29 GMT

Point Based Value Iteration with Optimal Belief Compression for Dec-POMDPs

This paper presents four major results towards solving decentralized partially observable Markov decision problems (DecPOMDPs) culminating in an algorithm that outperforms all existing algorithms on all but one standard infinite-horizon benchmark problems. The program is notable because its linear relaxation is very often integral. These actions correspond to strategies of a CBG. We choose one such algorithm, point-based valued iteration, and modify it to produce the first tractable value iteration method for DecPOMDPs which outperforms existing algorithms.

algorithm, decpomdp, optimal belief compression, (4 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)

Neural Information Processing SystemsAug-15-2025, 10:16:33 GMT

Multi-agent active perception with prediction rewards

Multi-agent active perception is a task where a team of agents cooperatively gathers observations to compute a joint estimate of a hidden variable.

dec-pomdp, nullm, prediction reward, (13 more...)

Country:

Europe > Netherlands > South Holland > Delft (0.04)
North America > Canada (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Hamburg (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.84)

Tessera, Kale-ab Abebe, Hinckeldey, Leonard, Zamboni, Riccardo, Abel, David, Storkey, Amos

Remembering the Markov Property in Cooperative MARL

arXiv.org Artificial IntelligenceJul-25-2025

Cooperative multi-agent reinforcement learning (MARL) is typically formalised as a Decentralised Partially Observable Markov Decision Process (Dec-POMDP), where agents must reason about the environment and other agents' behaviour. In practice, current model-free MARL algorithms use simple recurrent function approximators to address the challenge of reasoning about others using partial information. In this position paper, we argue that the empirical success of these methods is not due to effective Markov signal recovery, but rather to learning simple conventions that bypass environment observations and memory. Through a targeted case study, we show that co-adapting agents can learn brittle conventions, which then fail when partnered with non-adaptive agents. Crucially, the same models can learn grounded policies when the task design necessitates it, revealing that the issue is not a fundamental limitation of the learning models but a failure of the benchmark design. Our analysis also suggests that modern MARL environments may not adequately test the core assumptions of Dec-POMDPs. We therefore advocate for new cooperative environments built upon two core principles: (1) behaviours grounded in observations and (2) memory-based reasoning about other agents, ensuring success requires genuine skill rather than fragile, co-adapted agreements.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

2507.18333

Country: Europe (0.46)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Han, Shuai, Dastani, Mehdi, Wang, Shihan

Credit Assignment and Efficient Exploration based on Influence Scope in Multi-agent Reinforcement Learning

arXiv.org Artificial IntelligenceMay-14-2025

Training cooperative agents in sparse-reward scenarios poses significant challenges for multi-agent reinforcement learning (MARL). Without clear feedback on actions at each step in sparse-reward setting, previous methods struggle with precise credit assignment among agents and effective exploration. In this paper, we introduce a novel method to deal with both credit assignment and exploration problems in reward-sparse domains. Accordingly, we propose an algorithm that calculates the Influence Scope of Agents (ISA) on states by taking specific value of the dimensions/attributes of states that can be influenced by individual agents. The mutual dependence between agents' actions and state attributes are then used to calculate the credit assignment and to delimit the exploration space for each individual agent. We then evaluate ISA in a variety of sparse-reward multi-agent scenarios. The results show that our method significantly outperforms the state-of-art baselines.

artificial intelligence, machine learning, reinforcement learning, (14 more...)