Is Exploration or Optimization the Problem for Deep Reinforcement Learning?