Export Reviews, Discussions, Author Feedback and Meta-Reviews
–Neural Information Processing Systems
How do these compare to the regret bounds of the paper at hand? - After the definition of regret, it is noted that the latter is random due to the randomness of M* (and the randomness of the algorithms and observations). It is not clear to me why M* is supposed to be random and not a fixed underlying MDP. - In the definition of the factored MDPs I did not understand the role of the set X. Does this correspond to a set of state-action pairs?
Neural Information Processing Systems
Oct-2-2025, 17:47:56 GMT