Doubly Optimal Policy Evaluation for Reinforcement Learning