The False Promise of Off-Policy Reinforcement Learning Algorithms

#artificialintelligence 

We have all witnessed the rapid development of reinforcement learning methods in the last couple of years. Most notably the biggest attention has been given to off-policy methods and the reason is quite obvious, they scale really well in comparison to other methods. Off-policy algorithms can (in principle) learn from data without interacting with the environment. This is a nice property, this means that we can collect our data by any means that we see fit and infer the optimal policy completely offline, in other words, we use a different behavioral policy that the one we are optimizing. Unfortunately, this doesn't work out of the box like most people think, as I will describe in this article.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found