Learning by observation can be of key importance whenever agents sharing similar features want to learn from each other. This paper presents an agent architecture that enables software agents to learn by direct observation of the actions executed by expert agents while they are performing a task. This is possible because the proposed architecture displays information that is essential for observation, making it possible for software agents to observe each other. The agent architecture supports a learning process that covers all aspects of learning by observation, such as discovering and observing experts, learning from the observed data, applying the acquired knowledge and evaluating the agent's progress. The evaluation provides control over the decision to obtain new knowledge or apply the acquired knowledge to new problems. We combine two methods for learning from the observed information. The first one, the recall method, uses the sequence on which the actions were observed to solve new problems. The second one, the classification method, categorizes the information in the observed data and determines to which set of categories the new problems belong. Results show that agents are able to learn in conditions where common supervised learning algorithms fail, such as when agents do not know the results of their actions a priori or when not all the effects of the actions are visible. The results also show that our approach provides better results than other learning methods since it requires shorter learning periods.
In this paper, we consider a territory guarding game involving pursuers, evaders and a target in an environment that contains obstacles. The goal of the evaders is to capture the target, while that of the pursuers is to capture the evaders before they reach the target. All the agents have limited sensing range and can only detect each other when they are in their observation space. We focus on the challenge of effective cooperation between agents of a team. Finding exact solutions for such multi-agent systems is difficult because of the inherent complexity. We present Multi-Agent Pursuer-Evader Learning (MAPEL), a class of algorithms that use spatio-temporal graph representation to learn structured cooperation. The key concept is that the learning takes place in a decentralized manner and agents use situation report updates to learn about the whole environment from each others' partial observations. We use Recurrent Neural Networks (RNNs) to parameterize the spatio-temporal graph. An agent in MAPEL only updates all the other agents if an opponent or the target is inside its observation space by using situation report. We present two methods for cooperation via situation report update: a) Peer-to-Peer Situation Report (P2PSR) and b) Ring Situation Report (RSR). We present a detailed analysis of how these two cooperation methods perform when the number of agents in the game are increased. We provide empirical results to show how agents cooperate under these two methods.
Learning from Observation (LfO), also known as Behavioral Cloning, is an approach for building software agents by recording the behavior of an expert (human or artificial) and using the recorded data to generate the required behavior. jLOAF is a platform that uses Case-Based Reasoning to achieve LfO. In this paper we interface jLOAF with the popular OpenAI Gym environment. Our experimental results show how our approach can be used to provide a baseline for comparison in this domain, as well as identify the strengths and weaknesses when dealing with environmental complexity.
Many real-world problems, such as network packet routing and urban traffic control, are naturally modeled as multi-agent reinforcement learning (RL) problems. However, existing multi-agent RL methods typically scale poorly in the problem size. Therefore, a key challenge is to translate the success of deep learning on single-agent RL to the multi-agent setting. A major stumbling block is that independent Q-learning, the most popular multi-agent RL method, introduces nonstationarity that makes it incompatible with the experience replay memory on which deep Q-learning relies. This paper proposes two methods that address this problem: 1) using a multi-agent variant of importance sampling to naturally decay obsolete data and 2) conditioning each agent's value function on a fingerprint that disambiguates the age of the data sampled from the replay memory. Results on a challenging decentralised variant of StarCraft unit micromanagement confirm that these methods enable the successful combination of experience replay with multi-agent RL.
Multi-agent reinforcement learning is a standard framework for modeling multi-agent interactions applied in real-world scenarios. Inspired by experience sharing in human groups, learning knowledge parallel reusing between agents can potentially promote team learning performance, especially in multi-task environments. When all agents interact with the environment and learn simultaneously, how each independent agent selectively learns from other agents' behavior knowledge is a problem that we need to solve. This paper proposes a novel knowledge transfer framework in MARL, PAT (Parallel Attentional Transfer). We design two acting modes in PAT, student mode and self-learning mode. Each agent in our approach trains a decentralized student actor-critic to determine its acting mode at each time step. When agents are unfamiliar with the environment, the shared attention mechanism in student mode effectively selects learning knowledge from other agents to decide agents' actions. PAT outperforms state-of-the-art empirical evaluation results against the prior advising approaches. Our approach not only significantly improves team learning rate and global performance, but also is flexible and transferable to be applied in various multi-agent systems.