A Details of the Experiments

Neural Information Processing Systems 

We define δ ( s,a) = null 1 if 1{s = 0 } = 1{a = 0 }, 0 otherwise. It is straightforward to verify that this is a valid time-inhomogeneous linear MDP . The results are reported in Figure 2. As mentioned in the discussion following Theorem 4.1, it holds that These findings also shed light on the minimax optimality of the OPE problem. VA-OPE is a promising candidate for achieving minimax optimality. We further investigate this in the next subsection.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found