In this article, I show the robustness of optimality of exploration ratioagainst the number of agents (agent population)under multiagent learning (MAL) situation in nonstationary environments.Agent population will affect efficiency of agents' learning becauseeach agent's learning causes noisy factors for others.From this point, exploration ratio should be small to make MAL effective.In nonstationary environments, on the other hand, each agent needs explore with enough probability to catch-upchanges of the environments.This means the exploration ratio need to be significantly large.I investigate the relation between the population and the efficiency ofexploration based on a theorem about relations betweenthe exploration ratio and a lower boundary of learning error.Finally, it is shown that the population of the agents does not affectthe optimal exploration ratio under a certain condition.This consequence is confirmed by several experimentsusing population games with various reward functions.
Recent multi-agent extensions of Q-Learning require knowledge of other agents' payoffs and Q-functions, and assume game-theoretic play at all times by all other agents. This paper proposes a fundamentally different approach, dubbed "Hyper-Q" Learning, in which values of mixed strategies rather than base actions are learned, and in which other agents' strategies are estimated from observed actions via Bayesian inference. Hyper-Qmay be effective against many different types of adaptive agents, even if they are persistently dynamic. Against certain broad categories of adaptation, it is argued that Hyper-Q may converge to exact optimaltime-varying policies. In tests using Rock-Paper-Scissors, Hyper-Q learns to significantly exploit an Infinitesimal Gradient Ascent (IGA) player, as well as a Policy Hill Climber (PHC) player. Preliminary analysis of Hyper-Q against itself is also presented.
The interaction of multiple autonomous agents gives rise to highly dynamic and nondeterministic environments, contributing to the complexity in applications such as automated financial markets, smart grids, or robotics. Due to the sheer number of situations that may arise, it is not possible to foresee and program the optimal behaviour for all agents beforehand. Consequently, it becomes essential for the success of the system that the agents can learn their optimal behaviour and adapt to new situations or circumstances. The past two decades have seen the emergence of reinforcement learning, both in single and multi-agent settings, as a strong, robust and adaptive learning paradigm. Progress has been substantial, and a wide range of algorithms are now available. An important challenge in the domain of multi-agent learning is to gain qualitative insights into the resulting system dynamics. In the past decade, tools and methods from evolutionary game theory have been successfully employed to study multi-agent learning dynamics formally in strategic interactions. This article surveys the dynamical models that have been derived for various multi-agent reinforcement learning algorithms, making it possible to study and compare them qualitatively. Furthermore, new learning algorithms that have been introduced using these evolutionary game theoretic tools are reviewed. The evolutionary models can be used to study complex strategic interactions. Examples of such analysis are given for the domains of automated trading in stock markets and collision avoidance in multi-robot systems. The paper provides a roadmap on the progress that has been achieved in analysing the evolutionary dynamics of multi-agent learning by highlighting the main results and accomplishments.
The problem of multi-agent learning and adaptation has attracted a great deal of attention in recent years. It has been suggested that the dynamics of multi agent learning can be studied using replicator equations from population biology. Most existing studies so far have been limited to discrete strategy spaces with a small number of available actions. In many cases, however, the choices available to agents are better characterized by continuous spectra. This paper suggests a generalization of the replicator framework that allows to study the adaptive dynamics of Q-learning agents with continuous strategy spaces. Instead of probability vectors, agents strategies are now characterized by probability measures over continuous variables. As a result, the ordinary differential equations for the discrete case are replaced by a system of coupled integral--differential replicator equations that describe the mutual evolution of individual agent strategies. We derive a set of functional equations describing the steady state of the replicator dynamics, examine their solutions for several two-player games, and confirm our analytical results using simulations.
In common-interest stochastic games all players receive an identical payoff. Players participating in such games must learn to coordinate with each other in order to receive the highest-possible value. A number of reinforcement learning algorithms have been proposed for this problem, and some have been shown to converge to good solutions in the limit. In this paper we show that using very simple model-based algorithms, much better (i.e., polynomial) convergence rates can be attained. Moreover, our model-based algorithms are guaranteed to converge to the optimal value, unlike many of the existing algorithms.