Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter η to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma, LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the iterated prisoner's dilemma.
Safe learning agents are agents whose learned behaviour can be predicted and analysed. Non-symbolic learning algorithms such as reinforcement learning rely on emergence and thus are not a good candidate to building safe AI systems. Our contention is that logicbased algorithms such as explanation-based learning and inductive logic programming should instead be used to design and implement safe, intelligent agents.
We discuss the role of coordination as a direct learning objective in multi-agent reinforcement learning (MARL) domains. To this end, we present a novel means of quantifying coordination in multi-agent systems, and discuss the implications of using such a measure to optimize coordinated agent policies. This concept has important implications for adversary-aware RL, which we take to be a sub-domain of multi-agent learning.
Multi-agent reinforcement learning has shown promise on a variety of cooperative tasks as a consequence of recent developments in differentiable inter-agent communication. However, most architectures are limited to pools of homogeneous agents, limiting their applicability. Here we propose a modular framework for learning complex tasks in which a traditional monolithic agent is framed as a collection of cooperating heterogeneous agents. We apply this approach to model sensorimotor coordination in the neocortex as a multi-agent reinforcement learning problem. Our results demonstrate proof-of-concept of the proposed architecture and open new avenues for learning complex tasks and for understanding functional localization in the brain and future intelligent systems.
Learning by observation can be of key importance whenever agents sharing similar features want to learn from each other. This paper presents an agent architecture that enables software agents to learn by direct observation of the actions executed by expert agents while they are performing a task. This is possible because the proposed architecture displays information that is essential for observation, making it possible for software agents to observe each other. The agent architecture supports a learning process that covers all aspects of learning by observation, such as discovering and observing experts, learning from the observed data, applying the acquired knowledge and evaluating the agent's progress. The evaluation provides control over the decision to obtain new knowledge or apply the acquired knowledge to new problems.