Learning Hierarchical Structures with Differentiable Nondeterministic Stacks