Counterexample Guided RL Policy Refinement Using Bayesian Optimization