Reviews: On Oracle-Efficient PAC RL with Rich Observations

Neural Information Processing Systems 

Moreover, the reward depends only on x_t and the action, not the state S_t. They then correctly state (again, lines 99-100) that this makes the problem an MDP over X. It argues "The hidden states serve to introduce structure to the MDP and enable tractable learning." I don't understand why this is the case.