Goto

Collaborating Authors

 use parametric model


When to use parametric models in reinforcement learning?

Neural Information Processing Systems

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free.


When to use parametric models in reinforcement learning?

Neural Information Processing Systems

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free.


Reviews: When to use parametric models in reinforcement learning?

Neural Information Processing Systems

This paper broadly considers the use of a learned parametric model. Through (1) toy examples, (2) theoretical analysis of a Dyna-like algorithm, and (3) a large scale study of sample-efficient model-free RL, it arrives at the conclusion that "using an imperfect (e.g., parametric) model to generate fictional experiences from truly observed states… should probably not result in better learning." While the individual pieces described above are all valuable, I am not sure this claim is properly qualified. For example: "More generally, if we use a perfect model to generate experiences only from states that were actually observed, the resulting updates would be indistinguishable from doing experience replay. In a sense, replay is a perfect model, albeit only from the states we have observed." I am not sure this is, as stated, exactly true.


Reviews: When to use parametric models in reinforcement learning?

Neural Information Processing Systems

There's consensus that this is a well written paper that offers some useful insights about the pros and cons of model-based RL vs. model-free RL with replay buffers. This is an important topic and this paper has the potential to make significant impact. However, the authors are urged to be careful about not making overly general conclusions in the final version of the paper, as this was a concern of one reviewer. Even the title may be too general.


When to use parametric models in reinforcement learning?

Neural Information Processing Systems

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free.


When to use parametric models in reinforcement learning?

Hasselt, Hado P. van, Hessel, Matteo, Aslanides, John

Neural Information Processing Systems

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free.


When to use parametric models in reinforcement learning?

van Hasselt, Hado, Hessel, Matteo, Aslanides, John

arXiv.org Artificial Intelligence

We examine the question of when and how parametric models are most useful in reinforcement learning. In particular, we look at commonalities and differences between parametric models and experience replay. Replay-based learning algorithms share important traits with model-based approaches, including the ability to plan: to use more computation without additional data to improve predictions and behaviour. We discuss when to expect benefits from either approach, and interpret prior work in this context. We hypothesise that, under suitable conditions, replay-based algorithms should be competitive to or better than model-based algorithms if the model is used only to generate fictional transitions from observed states for an update rule that is otherwise model-free. We validated this hypothesis on Atari 2600 video games. The replay-based algorithm attained state-of-the-art data efficiency, improving over prior results with parametric models.