AITopics | joint policy

Collaborating Authors

joint policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing SystemsJun-22-2026, 22:27:55 GMT

In this paper, we consider fully decentralized cooperative multi-agent reinforcement learning, where each agent has access only to the states, its local actions, and the shared rewards. The absence of information about other agents' actions typically leads to the non-stationarity problem during per-agent value function updates, and the relative overgeneralization issue during value function estimation. However, existing works fail to address both issues simultaneously, as they lack the capability to model the agents' joint policy in a fully decentralized setting. To overcome this limitation, we propose a simple yet effective method named Return-Aware Context (RAC). RAC formalizes the dynamically changing task, as locally perceived by each agent, as a contextual Markov Decision Process (MDP), and addresses both nonstationarity and relative overgeneralization through return-aware context modeling. Specifically, the contextual MDP attributes the non-stationary local dynamics of each agent to switches between contexts, each corresponding to a distinct joint policy.

agent, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > China (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.84)

Add feedback

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing SystemsJun-14-2026, 05:37:12 GMT

In this paper, we consider fully decentralized cooperative multi-agent reinforcement learning, where each agent has access only to the states, its local actions, and the shared rewards. The absence of information about other agents' actions typically leads to the non-stationarity problem during per-agent value function updates, and the relative overgeneralization issue during value function estimation. However, existing works fail to address both issues simultaneously, as they lack the capability to model the agents' joint policy in a fully decentralized setting. To overcome this limitation, we propose a simple yet effective method named Return-Aware Context (RAC). RAC formalizes the dynamically changing task, as locally perceived by each agent, as a contextual Markov Decision Process (MDP), and addresses both non-stationarity and relative overgeneralization through return-aware context modeling. Specifically, the contextual MDP attributes the non-stationary local dynamics of each agent to switches between contexts, each corresponding to a distinct joint policy.

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation

Neural Information Processing SystemsMay-1-2026, 04:57:48 GMT

One of the main challenges in offline Reinforcement Learning (RL) is the distribution shift that arises from the learned policy deviating from the data collection policy. This is often addressed by avoiding out-of-distribution (OOD) actions during policy improvement as their presence can lead to substantial performance degradation. This challenge is amplified in the offline Multi-Agent RL (MARL) setting since the joint action space grows exponentially with the number of agents. To avoid this curse of dimensionality, existing MARL methods adopt either value decomposition methods or fully decentralized training of individual agents. However, even when combined with standard conservatism principles, these methods can still result in the selection of OOD joint actions in offline MARL. To this end, we introduce AlberDICE, an offline MARL algorithm that alternatively performs centralized training of individual agents based on stationary distribution optimization. AlberDICE circumvents the exponential complexity of MARL by computing the best response of one agent at a time while effectively avoiding OOD joint action selection. Theoretically, we show that the alternating optimization procedure converges to Nash policies. In the experiments, we demonstrate that AlberDICE significantly outperforms baseline algorithms on a standard suite of MARL benchmarks.

agent, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)

Add feedback

CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning

Hedman, Marcel, Tessera, Kale-ab Abebe, Formanek, Juan Claude, Sims, Anya, Zamboni, Riccardo, McInroe, Trevor, Torr, John, Fosong, Elliot

arXiv.org Machine LearningApr-28-2026

Offline multi-agent reinforcement learning (MARL) enables policy learning from fixed datasets, but is prone to coordination failure: agents trained on static, off-policy data converge to suboptimal joint behaviours because they cannot co-adapt as their policies change. We introduce CODA (Coordination via On-Policy Diffusion for Multi-Agent Reinforcement Learning), a diffusion-based multi-agent trajectory generator for data augmentation that samples conditioned on the current joint policy, producing synthetic experience which reflects the evolving behaviours of the agents, thereby providing a mechanism for co-adaptation. We find that previous diffusion-based augmentation approaches are insufficient for fostering multi-agent coordination because they produce static augmented datasets that do not evolve as the current joint policy changes during training; CODA resolves this by more closely simulating on-policy learning and is a meaningful step toward coordinated behaviours in the offline setting. CODA is algorithm-agnostic and can be layered onto both model-free and model-based offline reinforcement learning pipelines as an augmentation module. Empirically, CODA not only resolves canonical coordination pathologies in continuous polynomial games but also delivers strong results on the more complex MaMuJoCo continuous-control benchmarks.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Machine Learning

2604.23308

Country:

Europe (0.67)
North America > United States (0.46)

Genre: Research Report (0.50)

Industry:

Education (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Details

Neural Information Processing SystemsApr-27-2026, 10:23:28 GMT

A.1 Difference between the performance of two joint policies In Section 3.1, the difference between the performance of two joint policies is expressed as follows: The proof is a multi-agent version of the proof in (Kakade and Langford, 2002). Now we provide the mathematical detail formally. A.2 Approximation that matches the true value to first order In Section 3.1, we claim that Jπ( π) matches J( π) to first order. Intuitively, this means that a sufficiently small update of the joint policy which improves Jπ( π) will also improve J( π). Now we prove it formally.

agent, artificial intelligence, section 3, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.34)

Add feedback

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Neural Information Processing SystemsApr-24-2026, 13:10:23 GMT

Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD).

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multi-Agent Coordination via Multi-Level Communication

Neural Information Processing SystemsFeb-18-2026, 07:43:02 GMT

SeqComm are guaranteed to improve monotonically and converge.

agent, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Guangdong Province > Shenzhen (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Add feedback

A Algorithm

Neural Information Processing SystemsFeb-13-2026, 19:16:23 GMT

This section consists of three parts, with each subsequent part building upon the previous one. Appendix A.1 covers the fundamentals of RL, where the actor-critic method is introduced. Appendix A.2 describes the RL algorithm for a single fulfillment agent, which is the proximal policy Appendix A.3 presents the MARL algorithm for the Currently, policy-based methods [Deisenroth et al., 2013] are prevalent because they are compatible with stochastic To sum up, the complete procedure is given in Algorithm 1.Algorithm 1 Heterogeneous Multi-Agent Reinforcement Learning for Order Fulfillment. With regard to the advantage estimator, we set the GAE parameters [Schulman et al., 2016] To highlight how our proposed benchmark differs from existing approaches focused on sub-tasks of order fulfillment, we compare the objectives, observations, and actions in Table 1. It should be noted that multiple formulations exist for each sub-task.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Filters

Collaborating Authors

joint policy

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

In-Context Fully Decentralized Cooperative Multi-Agent Reinforcement Learning

AlberDICE: Addressing Out-Of-Distribution Joint Actions in Offline Multi-Agent RL via Alternating Stationary Distribution Correction Estimation

CODA: Coordination via On-Policy Diffusion for Multi-Agent Offline Reinforcement Learning

Details

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

8078e8c3055303a884ffae2d3ea00338-Supplemental-Conference.pdf

Multi-Agent Coordination via Multi-Level Communication

e5b6eb1dbabff82838d5e99f62de37c8-Paper-Conference.pdf

A Algorithm