The Online Coupon-Collector Problem and Its Application to Lifelong Reinforcement Learning
–arXiv.org Artificial Intelligence
Transferring knowledge across a sequence of related tasks is an important challenge in reinforcement learning (RL). Despite much encouraging empirical evidence, there has been little theoretical analysis. In this paper, we study a class of lifelong RL problems: the agent solves a sequence of tasks modeled as finite Markov decision processes (MDPs), each of which is from a finite set of MDPs with the same state/action sets and different transition/reward functions. Motivated by the need for cross-task exploration in lifelong learning, we formulate a novel online coupon-collector problem and give an optimal algorithm. This allows us to develop a new lifelong RL algorithm, whose overall sample complexity in a sequence of tasks is much smaller than single-task learning, even if the sequence of tasks is generated by an adversary. Benefits of the algorithm are demonstrated in simulated problems, including a recently introduced human-robot interaction problem.
arXiv.org Artificial Intelligence
Sep-21-2015
- Country:
- North America > United States (0.67)
- Genre:
- Research Report (0.82)
- Workflow (0.74)
- Industry:
- Education (0.34)