Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems 

How do these compare to the regret bounds of the paper at hand? - After the definition of regret, it is noted that the latter is random due to the randomness of M* (and the randomness of the algorithms and observations). It is not clear to me why M* is supposed to be random and not a fixed underlying MDP. - In the definition of the factored MDPs I did not understand the role of the set X. Does this correspond to a set of state-action pairs?