Nonparametric Model-Based Reinforcement Learning
–Neural Information Processing Systems
This paper describes some of the interactions of model learning algorithms and planning algorithms we have found in exploring model-based reinforcement learning. The paper focuses on how local trajectoryoptimizers can be used effectively with learned nonparametric models.We find that trajectory planners that are fully consistent with the learned model often have difficulty finding reasonable plansin the early stages of learning. Trajectory planners that balance obeying the learned model with minimizing cost (or maximizing reward) often do better, even if the plan is not fully consistent with the learned model. 1 INTRODUCTION We are exploring the use of nonparametric models in robot learning (Atkeson et al., 1997b; Atkeson and Schaal, 1997). This paper describes the interaction of model learning algorithms and planning algorithms, focusing on how local trajectory optimization canbe used effectively with nonparametric models in reinforcement learning. We find that trajectory optimizers that are fully consistent with the learned model often have difficulty finding reasonable plans in the early stages of learning. The message of this paper is that a planner should not be entirely consistent with the learned model during model-based reinforcement learning.
Neural Information Processing Systems
Dec-31-1998
- Country:
- Asia
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- Middle East > Jordan (0.04)
- Japan > Honshū
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- California > San Mateo County
- San Mateo (0.05)
- Georgia > Fulton County
- Atlanta (0.04)
- Massachusetts > Middlesex County
- Cambridge (0.04)
- New York > New York County
- New York City (0.04)
- California > San Mateo County
- Asia
- Technology: