Enforcing the consensus between Trajectory Optimization and Policy Learning for precise robot control