Off-policy Evaluation in Infinite-Horizon Reinforcement Learning with Latent Confounders