SubgaussianandDifferentiableImportanceSampling forOff-PolicyEvaluationandLearning