Multi-Step Dyna Planning for Policy Evaluation and Control

Yao, Hengshuai, Bhatnagar, Shalabh, Diao, Dongcui, Sutton, Richard S., Szepesvári, Csaba

Feb-15-2020, 04:12:13 GMT–Neural Information Processing Systems

We extend Dyna planning architecture for policy evaluation and control in two significant aspects. First, we introduce a multi-step Dyna planning that projects the simulated state/feature many steps into the future. Our multi-step Dyna is based on a multi-step model, which we call the {\em $\lambda$-model}. The $\lambda$-model interpolates between the one-step model and an infinite-step model, and can be learned efficiently online. Second, we use for Dyna control a dynamic multi-step model that is able to predict the results of a sequence of greedy actions and track the optimal policy in the long run.

artificial intelligence, multi-step dyna planning, policy evaluation and control, (3 more...)

Neural Information Processing Systems

Feb-15-2020, 04:12:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.35)