Many problems in multiagent learning involve repeated play. As such, naive application of Nash equilibrium concepts are often inappropriate. A recent algorithm in the literature (Stimpson & Goodrich 2003) uses a Nash bargaining perspective instead of a Nash equilibrium perspective, and learns to cooperate in self play in a social dilemma without exposing itself to being exploited by selfish agents. It does so without knowledge of the game or the actions of other agents. In this paper, we show that this algorithm likely converges to near pareto efficient solutions in self play in most nonzero-sum n-agent, m-action matrix games provided that parameters are set appropriately. Furthermore, we present a tremble based extension of this algorithm and show that it is guaranteed to play near pareto efficient solutions arbitrarily high percentages of the time in self play for the same large class of matrix games while allowing adaptation to changing environments.
The goal of an agent playing a repeated game is to maximize its payoffs over time. In repeated games between other learning agents, this often requires that an agent must learn to offer and accept profitable compromises. To do so, past research suggests that agents must implement both teaching and following strategies. However, few algorithms successfully employ both kinds of strategies simultaneously. In this paper, we present an algorithm (called SPaM) that employs both kinds of strategies simultaneously in 2-player matrix games when the complete game matrix is observable. We show (empirically) that SPaM learns quickly and effectively when associating with a large class of agents, including self, best response learners, and (perhaps) humans.
In this paper, a brief overview of neglect-tolerant humanrobot interaction is presented. Recent results of a neglecttolerance study are then summarized. The problem is then posed of how neglect tolerance affects how a human interacts with multiple robots, and a scenario is constructed that illustrates how multiple robot management can produce a problem with the form of a prisoner's dilemma. An abstraction of this prisoner's dilemma problem is then presented, and two robot learning algorithms are outlined that may address key points in this abstracted dilemma.
Many scenarios involve a tension between individual interest and the interests of others. Such situations are called social dilemmas. Because of their ubiquity in economic and social interactions constructing agents that can solve social dilemmas is of prime importance to researchers interested in multi-agent systems. We discuss why social dilemmas are particularly difficult, propose a way to measure the 'success' of a strategy, and review recent work on using deep reinforcement learning to construct agents that can do well in both perfect and imperfect information bilateral social dilemmas.
Learning in many multi-agent settings is inherently repeated play. This calls into question the naive application of single play Nash equilibria in multi-agent learning and suggests, instead, the application of give-andtake principles of bargaining. We modify and analyze a satisficing algorithm based on (Karandikar et al., 1998) that is compatible with the bargaining perspective. This algorithm is a form of relaxation search that converges to a satisficing equilibrium without knowledge of game payoffs or other agents' actions. We then develop an M action, N player social dilemma that encodes the key elements of the Prisoner's Dilemma. This game is instructive because it characterizes social dilemmas with more than two agents and more than two choices. We show how several different multi-agent learning algorithms behave in this social dilemma, and demonstrate that the satisficing algorithm converges, with high probability, to a Pareto efficient solution in self play and to the single play Nash equilibrium against selfish agents. Finally, we present theoretical results that characterize the behavior of the algorithm.