Exploring Offline Policy Evaluation for the Continuous-Armed Bandit Problem

Open in new window