Reinforcement learning's foundational flaw
In this essay, we are going to address the limitations of one of the core fields of AI. In the process, we will encounter a fun allegory, a set of methods of incorporating prior knowledge and instruction into deep learning, and a radical conclusion.[1] The first part, which you're reading right now, will set up what RL is and why it (or at least a particular version of it we shall name'pure RL' and soon define) is fundamentally flawed. It will contain some explanation that can be skipped by AI practitioners -- but be sure to stick around for the discussion of recent non pure-RL work we shall argue represents the fix to pure RL's foundational flaw. But for now, let us start with a fun allegory.
Jan-11-2019, 17:21:00 GMT