Supplement for Counterexample Guided RL Policy Refinement Using Bayesian Optimization

Neural Information Processing Systems 

We executed these methods for 20 iterations each having 200 testing samples. We report the mean and standard deviation of the number of counterexamples discovered.