A Additional Results is to minimize the system regret, so that should be the primary measure of performance; however, we also provide a variety of other metrics to give a broader picture of how

Neural Information Processing Systems 

We generate 24000 data points in total, from which n=2000 are sampled for each randomly seeded run of experiments; the context space consists of 8 attributes, and the decision space consists of 4 rocket actions.