A Off-policy evaluation dual objective We formulate the estimation of the stationary state distribution µ