A Details of the Experiments
–Neural Information Processing Systems
We define δ ( s,a) = null 1 if 1{s = 0 } = 1{a = 0 }, 0 otherwise. It is straightforward to verify that this is a valid time-inhomogeneous linear MDP . The results are reported in Figure 2. As mentioned in the discussion following Theorem 4.1, it holds that These findings also shed light on the minimax optimality of the OPE problem. VA-OPE is a promising candidate for achieving minimax optimality. We further investigate this in the next subsection.
Neural Information Processing Systems
Aug-14-2025, 05:18:27 GMT
- Technology: