Off-Policy Evaluation of Bandit Algorithm from Dependent Samples under Batch Update Policy

Open in new window