Directed Policy Gradient for Safe Reinforcement Learning with Human Advice