Vlearn: Off-Policy Learning with Efficient State-Value Function Estimation