Universal Option Models
yao, hengshuai, Szepesvari, Csaba, Sutton, Richard S., Modayil, Joseph, Bhatnagar, Shalabh
–Neural Information Processing Systems
We consider the problem of learning models of options for real-time abstract planning, in the setting where reward functions can be specified at any time and their expected returns must be efficiently computed. We introduce a new model for an option that is independent of any reward function, called the {\it universal option model (UOM)}. We prove that the UOM of an option can construct a traditional option model given a reward function, and the option-conditional return is computed directly by a single dot-product of the UOM with the reward function. We extend the UOM to linear function approximation, and we show it gives the TD solution of option returns and value functions of policies over options. We provide a stochastic approximation algorithm for incrementally learning UOMs from data and prove its consistency.
Neural Information Processing Systems
Feb-14-2020, 07:11:45 GMT
- Technology: