Model-free Reinforcement Learning for Robust Locomotion Using Trajectory Optimization for Exploration