Reviews: Linear Feature Encoding for Reinforcement Learning

Neural Information Processing Systems 

Summary: The idea of coupling reward and dynamics in an autoencoder-like model is a novel contribution which could benefit our community. I also appreciate that the authors have applied their model on pixel-based observation spaces. However, I find the theory of lines 124 to 136 unnecessary and the fact that it reproduces Parr (2007) line by line is problematic (more on this below). Also, example 1 seems misguided since it simply does adopt the right problem formulation to start with (it seems sufficient to simply start with a Markov chain over state-action pairs). Detailed comments: Abstract: l. 4. and sect. 1 l.25: "Typical deep RL [...]" and "It is common" Is that true? Beside DQN, what are other examples?