Hidden Parameter Markov Decision Processes: An Emerging Paradigm for Modeling Families of Related Tasks
Konidaris, George (Duke University) | Doshi-Velez, Finale (Harvard Medical School)
The goal of transfer is to use knowledge obtained by solving one task to improvea robot's (or software agent's) performance in future tasks. In general, we do not expect this to work; for transfer to be feasible, there must be something in common between the source task(s) and goal task(s). The question at the core of the transfer learning enterprise is therefore: what makes two tasks related?, or more generally, how do you define a family of related tasks? Given a precise definition of how a particular family of tasks is related, we can formulate clear optimizationmethods for selecting source tasks and determining what knowledge should be imported from the source task(s), and how it should be used in the target task(s). This paper describes one model that has appeared in several different research scenarios where an agent is faced with afamily of tasks that have similar, but not identical, dynamics (or reward functions). For example, a human learning to play baseball may, over the course of their career,be exposed to several different bats, each with slightly different weights and lengths.A human who has learned to play baseball well with one bat would be expected to be able to pick up any similar bat and use it.Similarly, when learning to drive a car, one may learn in more than one car, and then be expected to be able to drive any make and model of car (within reasonablevariations) with little or no relearning. These examples are instances of exactly the kind of flexible, reliable,and sample-efficient behavior that we should be aiming to achieve in robotics applications. One way to model such a family of tasks is to posit that they are generated by asmall set of latent parameters (e.g., the length and weight of the bat, or parametersdescribing the various physical properties of the car's steering system and clutch) thatare fixed for each problem instance (e.g., for each bat, or car), but are not directlyobservable by the agent. Defining a distributionover these latent parameters results in a family of related tasks, and transferis feasible to the extent that the number of latent variables is small, the task dynamics(or reward function) vary smoothly with them, and to the extent to which they can eitherbe ignored or identified using transition data from the task.This model has appeared under several different names in the literature; we refer to it as a hidden-parameterMarkov decision process (or HIP-MDP).
Nov-1-2014
- Country:
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Massachusetts
- Middlesex County > Cambridge (0.14)
- Suffolk County > Boston (0.04)
- North Carolina > Durham County
- Durham (0.04)
- Massachusetts
- Europe > United Kingdom
- Industry:
- Leisure & Entertainment > Sports (0.55)
- Technology: