that (1) there is a bijection between state spaces and (2) through which the subMDPs have the same transition/reward
–Neural Information Processing Systems
We thank all reviewers for spending their valuable time reviewing our paper. We now answer some specific question in detail. The definition of "equivalent subMDPs" (Definition 2) requires As discussed in the paper, (2) can be relaxed to similar transition/reward models. For the statistical efficiency results, this assumption could be relaxed, e.g. if a However, it is beyond the scope of this paper and we aim to address it in future work. We will add a more explicit discussion about the comparison to Mann et al. (2015) Theorem 1 in this paper is partially motivated by Osband et al. (2013); however, we consider a very different setting and Specifically, (1) Theorem 1 considers hierarchical structure while Osband et al.
Neural Information Processing Systems
Nov-13-2025, 23:38:18 GMT
- Technology: