Reinforcement Learning in a Safety-Embedded MDP with Trajectory Optimization