AITopics | markov game

statements and

Neural Information Processing SystemsApr-29-2026, 14:25:40 GMT

Let a two-player Markov game where both players affect the transition. We will effectively show that the problem of best-responding to a correlated policy σ is526 equivalent to best-responding to the marginal policy of σ for the opponent. The proof follows from527 the equivalence of the two MDPs.528 Before that, given a (possibly correlated) joint policy σ we define a nonlinear program, (PBR), whose539 optimal solutions are best-response policies of each agent k to σ k and the values for each state s540 and timestep h:541 A.2 Proof of Theorem 3.2542 The best-response program. First, we state the following lemma that will prove useful for several543 of our arguments,544 Lemma A.1 (Best-response LP).

artificial intelligence, global minimum, value function, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)

Add feedback

d6f681da2151687df12cc21a1c1e3527-Paper-Conference.pdf

Neural Information Processing SystemsApr-28-2026, 00:12:38 GMT

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.46)

Genre: Research Report (0.46)

Technology:

Information Technology > Game Theory (0.93)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)

Add feedback

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

Neural Information Processing SystemsMar-21-2026, 23:29:10 GMT

We study the problem of learning a Nash equilibrium (NE) in Markov games which is a cornerstone in multi-agent reinforcement learning (MARL). In particular, we focus on infinite-horizon adversarial team Markov games (ATMGs) in which agents that share a common reward function compete against a single opponent, *the adversary*. These games unify two-player zero-sum Markov games and Markov potential games, resulting in a setting that encompasses both collaboration and competition. Kalogiannis et al. (2023) provided an efficient equilibrium computation algorithm for ATMGs which presumes knowledge of the reward and transition functions and has no sample complexity guarantees. We contribute a learning algorithm that utilizes MARL policy gradient methods with iteration and sample complexity that is polynomial in the approximation error $\epsilon$ and the natural parameters of the ATMG, resolving the main caveats of the solution by (Kalogiannis et al., 2023).

artificial intelligence, machine learning, reinforcement learning, (8 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.58)

Add feedback

A Deep Bayesian Policy Reuse Approach Against Non-Stationary Agents

Neural Information Processing SystemsMar-16-2026, 23:01:05 GMT

In multiagent domains, coping with non-stationary agents that change behaviors from time to time is a challenging problem, where an agent is usually required to be able to quickly detect the other agent's policy during online interaction, and then adapt its own policy accordingly.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.37)

Add feedback

Inequity aversion improves cooperation in intertemporal social dilemmas

Neural Information Processing SystemsMar-16-2026, 22:30:05 GMT

Groups of humans are often able to find ways to cooperate with one another in complex, temporally extended social dilemmas. Models based on behavioral economics are only able to explain this phenomenon for unrealistic stateless matrix games. Recently, multi-agent reinforcement learning has been applied to generalize social dilemma problems to temporally and spatially extended Markov games. However, this has not yet generated an agent that learns to cooperate in social dilemmas as humans do. A key insight is that many, but not all, human individuals have inequity averse social preferences. This promotes a particular resolution of the matrix game social dilemma wherein inequity-averse individuals are personally pro-social and punish defectors. Here we extend this idea to Markov games and show that it promotes cooperation in several types of sequential social dilemma, via a profitable interaction with policy learnability. In particular, we find that inequity aversion improves temporal credit assignment for the important class of intertemporal social dilemmas. These results help explain how large-scale cooperation may emerge and persist.

machine learning, reinforcement learning, social dilemma, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.97)

Add feedback

eb1848290d5a7de9c9ccabc67fefa211-Paper.pdf

Neural Information Processing SystemsMar-14-2026, 07:00:29 GMT

algorithm, convergence, matrix game, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Game Theory (0.95)

Add feedback

Unified

Neural Information Processing SystemsFeb-19-2026, 07:24:31 GMT

Policy optimization, i.e. algorithms that learn to make sequential decisions by local search on the agent's policy directly, is a widely used class of algorithms in reinforcement learning [40, 44, 45].

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: Asia > Middle East > Jordan (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)

Add feedback

Team-Fictitious Play for Reaching Team-Nash Equilibrium in Multi-team Games

Neural Information Processing SystemsFeb-18-2026, 03:42:12 GMT

Team-Nash equilibrium (TNE) predicts the outcomes of such coordinated interactions.

agent, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
Europe > Greece > Attica > Athens (0.04)
Asia > Middle East > Republic of Türkiye (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Leisure & Entertainment > Games (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.92)

Add feedback

b6e271e596574f2b2dfadec6b3ba22a4-Paper-Conference.pdf

Neural Information Processing SystemsFeb-17-2026, 16:33:10 GMT

equilibrium, machine learning, reinforcement learning, (20 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.28)
North America > United States > Hawaii (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback

Learning Equilibria in Adversarial Team Markov Games: A Nonconvex-Hidden-Concave Min-Max Optimization Problem

Neural Information Processing SystemsFeb-17-2026, 07:18:08 GMT

The joint decisions of the agents influence both individual rewards and the transition of the environment. MARL in general is occupied with leading the multi-agent system to a favorable outcome. Through the lens of game theory, the notion of a "favorable outcome" is formally defined through concepts like a Nash

adversary, artificial intelligence, optimization problem, (16 more...)

Neural Information Processing Systems

Country: