Lifetime policy reuse and the importance of task capacity

Bossens, David M., Sobey, Adam J.

arXiv.org Artificial Intelligence 

A long-standing challenge in artificial intelligence is lifelong learning. In lifelong learning, many tasks are presented in sequence and learners must efficiently transfer knowledge between tasks while avoiding catastrophic forgetting over long lifetimes. On these problems, policy reuse and other multi-policy reinforcement learning techniques can learn many tasks. However, they can generate many temporary or permanent policies, resulting in memory issues. Consequently, there is a need for lifetime-scalable methods that continually refine a policy library of a pre-defined size. This paper presents a first approach to lifetime-scalable policy reuse. To pre-select the number of policies, a notion of task capacity, the maximal number of tasks that a policy can accurately solve, is proposed. To evaluate lifetime policy reuse using this method, two state-of-the-art single-actor base-learners are compared: 1) a value-based reinforcement learner, Deep Q-Network (DQN) or Deep Recurrent Q-Network (DRQN); and 2) an actor-critic reinforcement learner, Proximal Policy Optimisation (PPO) with or without Long Short-Term Memory layer. By selecting the number of policies based on task capacity, D(R)QN achieves near-optimal performance with 6 policies in a 27-task MDP domain and 9 policies in an 18-task POMDP domain; with fewer policies, catastrophic forgetting and negative transfer are observed. Due to slow, monotonic improvement, PPO requires fewer policies, 1 policy for the 27-task domain and 4 policies for the 18-task domain, but it learns the tasks with lower accuracy than D(R)QN. These findings validate lifetime-scalable policy reuse and suggest using D(R)QN for larger and PPO for smaller library sizes. During their lifetime, animals may be subjected to a large number of unknown tasks. In some environmental conditions, nutritious food sources may be readily available, while in others they may be sparse, hidden, or even poisonous, and dangerous predators may roam in their vicinity. To address these challenging conditions, various behaviours must be selectively combined, such as avoidance, reward-seeking, or even fleeing. When direct perception provides limited or no cues about the current task, animals have to infer the task or use a strategy that works for many different tasks it may encounter. Therefore, a key challenge for learning challenging sequences of tasks is to find a limited number of strategies that work on the large domain of tasks that animals encounter over their lifetime. In artificial intelligence, variants of the above problem have been studied with investigations focusing on two aspects: transfer learning and catastrophic forgetting. Transfer learning is a process in which learners leverage the knowledge gained from a set of previously learned tasks with similar characteristics to a new task, whilst avoiding transferring knowledge that is not relevant [Taylor and Stone, 2009, Pan and Yang, 2010, Lazaric, 2013].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found