Conservative Exploration for Policy Optimization via Off-Policy Policy Evaluation