Off-Policy Evaluation via the Regularized Lagrangian