We discuss the role of coordination as a direct learning objective in multi-agent reinforcement learning (MARL) domains. To this end, we present a novel means of quantifying coordination in multi-agent systems, and discuss the implications of using such a measure to optimize coordinated agent policies. This concept has important implications for adversary-aware RL, which we take to be a sub-domain of multi-agent learning.
Multi-agent reinforcement learning has received significant interest in recent years notably due to the advancements made in deep reinforcement learning which have allowed for the developments of new architectures and learning algorithms. Using social dilemmas as the training ground, we present a novel learning architecture, Learning through Probing (LTP), where agents utilize a probing mechanism to incorporate how their opponent's behavior changes when an agent takes an action. We use distinct training phases and adjust rewards according to the overall outcome of the experiences accounting for changes to the opponents behavior. We introduce a parameter η to determine the significance of these future changes to opponent behavior. When applied to the Iterated Prisoner's Dilemma, LTP agents demonstrate that they can learn to cooperate with each other, achieving higher average cumulative rewards than other reinforcement learning methods while also maintaining good performance in playing against static agents that are present in Axelrod tournaments. We compare this method with traditional reinforcement learning algorithms and agent-tracking techniques to highlight key differences and potential applications. We also draw attention to the differences between solving games and societal-like interactions and analyze the training of Q-learning agents in makeshift societies. This is to emphasize how cooperation may emerge in societies and demonstrate this using environments where interactions with opponents are determined through a random encounter format of the iterated prisoner's dilemma.
When DeepMind's AlphaGo defeated South Korean master Lee Se-dol, it was a historic stride for AI. The depth of this development, coupled with higher computing power and cheaper data storage, is moving AI into the mainstream. Perhaps the most popular application of AI today comes in the form of virtual assistants and bots, or "agents" as my good friend Shivon defines them. An agent can schedule your meetings, manage your finances, book your travels, order your meals, and more. And even though these agents are typically focused on one specific task, it's remarkable to consider how much progress we have made outsourcing mundane work for a fraction of the cost.
Learning by observation can be of key importance whenever agents sharing similar features want to learn from each other. This paper presents an agent architecture that enables software agents to learn by direct observation of the actions executed by expert agents while they are performing a task. This is possible because the proposed architecture displays information that is essential for observation, making it possible for software agents to observe each other. The agent architecture supports a learning process that covers all aspects of learning by observation, such as discovering and observing experts, learning from the observed data, applying the acquired knowledge and evaluating the agent's progress. The evaluation provides control over the decision to obtain new knowledge or apply the acquired knowledge to new problems.
Abstract: We are increasingly surrounded by artificially intelligent technology that takes decisions and executes actions on our behalf. This creates a pressing need for general means to communicate with, instruct and guide artificial agents, with human language the most compelling means for such communication. To achieve this in a scalable fashion, agents must be able to relate language to the world and to actions; that is, their understanding of language must be grounded and embodied. However, learning grounded language is a notoriously challenging problem in artificial intelligence research. Here we present an agent that learns to interpret language in a simulated 3D environment where it is rewarded for the successful execution of written instructions.