Learning Task Decomposition with Ordered Memory Policy Network