Policy Gradient Bayesian Robust Optimization for Imitation Learning