AITopics | Agents

0a113ef6b61820daa5611c870ed8d5ee-Supplemental.pdf

Neural Information Processing SystemsApr-24-2026, 14:57:39 GMT

artificial intelligence, machine learning, qmix, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Regularized Softmax Deep Multi-Agent Q-Learning

Neural Information Processing SystemsApr-24-2026, 14:57:35 GMT

Tackling overestimation in Q-learning is an important problem that has been extensively studied in single-agent reinforcement learning, but has received comparatively little attention in the multi-agent setting. In this work, we empirically demonstrate that QMIX, a popular Q-learning algorithm for cooperative multiagent reinforcement learning (MARL), suffers from a more severe overestimation in practice than previously acknowledged, and is not mitigated by existing approaches. We rectify this with a novel regularization-based update scheme that penalizes large joint action-values that deviate from a baseline and demonstrate its effectiveness in stabilizing learning. Furthermore, we propose to employ a softmax operator, which we efficiently approximate in a novel way in the multiagent setting, to further reduce the potential overestimation bias. Our approach, Regularized Softmax (RES) Deep Multi-Agent Q-Learning, is general and can be applied to any Q-learning based MARL algorithm. We demonstrate that, when applied to QMIX, RES avoids severe overestimation and significantly improves performance, yielding state-of-the-art results in a variety of cooperative multi-agent tasks, including the challenging StarCraft II micromanagement benchmarks.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Optimality and Stability in Federated Learning: AGame-theoretic Approach

Neural Information Processing SystemsApr-24-2026, 14:36:59 GMT

Federated learning is a distributed learning paradigm where multiple agents, each only with access to local data, jointly learn a global model. There has recently been an explosion of research aiming not only to improve the accuracy rates of federated learning, but also provide certain guarantees around social good properties such as total error. One branch of this research has taken a game-theoretic approach, and in particular, prior work has viewed federated learning as a hedonic game, where error-minimizing players arrange themselves into federating coalitions. This past work proves the existence of stable coalition partitions, but leaves open a wide range of questions, including how far from optimal these stable solutions are. In this work, we motivate and define a notion of optimality given by the average error rates among federating agents (players).

artificial intelligence, learning, machine learning, (17 more...)

Neural Information Processing Systems

Industry:

Health & Medicine (1.00)
Education (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.49)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

080acdcce72c06873a773c4311c2e464-Paper.pdf

Neural Information Processing SystemsApr-24-2026, 13:30:36 GMT

artificial intelligence, machine learning, reinforcement learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.67)
Asia (0.46)

Genre: Research Report (0.93)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
(2 more...)

Add feedback

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

Neural Information Processing SystemsApr-24-2026, 13:11:09 GMT

Softmax policy gradient is a popular algorithm for policy optimization in singleagent reinforcement learning, particularly since projection is not needed for each gradient update. However, in multi-agent systems, the lack of central coordination introduces significant additional difficulties in the convergence analysis. Even for a stochastic game with identical interest, there can be multiple Nash Equilibria (NEs), which disables proof techniques that rely on the existence of a unique global optimum. Moreover, the softmax parameterization introduces non-NE policies with zero gradient, making it difficult for gradient-based algorithms in seeking NEs. In this paper, we study the finite time convergence of decentralized softmax gradient play in a special form of game, Markov Potential Games (MPGs), which includes the identical interest game as a special case. We investigate both gradient play and natural gradient play, with and without log-barrier regularization. The established convergence rates for the unregularized cases contain a trajectory dependent constant that can be arbitrarily large, whereas the log-barrier regularization overcomes this drawback, with the cost of slightly worse dependence on other factors such as the action set size. An empirical study on an identical interest matrix game confirms the theoretical findings.

artificial intelligence, gradient play, machine learning, (16 more...)

Neural Information Processing Systems

Country:

North America (0.45)
Oceania > Australia (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

Neural Information Processing SystemsApr-24-2026, 13:10:23 GMT

Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD).

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

0bf54b80686d2c4dc0808c2e98d430f7-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsApr-24-2026, 12:50:16 GMT

machine learning, reinforcement, reinforcement learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.46)
Asia > China (0.28)

Genre: Research Report (0.68)

Industry:

Information Technology (1.00)
Banking & Finance > Trading (1.00)

Technology:

Information Technology > Data Science > Data Quality (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

08342dc6ab69f23167b4123086ad4d38-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 12:32:22 GMT

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)

Add feedback

Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards

Neural Information Processing SystemsApr-24-2026, 12:32:18 GMT

As the capabilities of artificial agents improve, they are being increasingly deployed to service multiple diverse objectives and stakeholders. However, the composition of these objectives is often performed ad hoc, with no clear justification. This paper takes a normative approach to multi-objective agency: from a set of intuitively appealing axioms, it is shown that Markovian aggregation of Markovian reward functions is not possible when the time preference (discount factor) for each objective may vary. It follows that optimal multi-objective agents must admit rewards that are non-Markovian with respect to the individual objectives. To this end, a practical non-Markovian aggregation scheme is proposed, which overcomes the impossibility with only one additional parameter for each objective. This work offers new insights into sequential, multi-objective agency and intertemporal choice, and has practical implications for the design of AI systems deployed to serve multiple generations of principals with varying time preference.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Country: North America (0.28)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.96)

Add feedback

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning

Neural Information Processing SystemsApr-24-2026, 12:31:57 GMT

Cooperative multi-agent reinforcement learning (MARL) has made prominent progress in recent years. For training efficiency and scalability, most of the MARL algorithms make all agents share the same policy or value network. However, in many complex multi-agent tasks, different agents are expected to possess specific abilities to handle different subtasks. In those scenarios, sharing parameters indiscriminately may lead to similar behavior across all agents, which will limit the exploration efficiency and degrade the final performance. To balance the training complexity and the diversity of agent behavior, we propose a novel framework to learn dynamic subtask assignment (LDSA) in cooperative MARL. Specifically, we first introduce a subtask encoder to construct a vector representation for each subtask according to its identity.

artificial intelligence, machine learning, subtask, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

Filters

Collaborating Authors

Agents

0a113ef6b61820daa5611c870ed8d5ee-Supplemental.pdf

Regularized Softmax Deep Multi-Agent Q-Learning

Optimality and Stability in Federated Learning: AGame-theoretic Approach

080acdcce72c06873a773c4311c2e464-Paper.pdf

On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games

Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games

0bf54b80686d2c4dc0808c2e98d430f7-Paper-Datasets_and_Benchmarks.pdf

08342dc6ab69f23167b4123086ad4d38-Supplemental-Conference.pdf

Consistent Aggregation of Objectives with Diverse Time Preferences Requires Non-Markovian Rewards

LDSA: Learning Dynamic Subtask Assignment in Cooperative Multi-Agent Reinforcement Learning