Value function estimation in Markov reward processes: Instance-dependent $\ell_\infty$-bounds for policy evaluation

Open in new window