Supplement for Counterexample Guided RL Policy Refinement Using Bayesian Optimization