Hierarchical model-based policy optimization: from actions to action sequences and back