Self-Attention Networks Can Process Bounded Hierarchical Languages

Open in new window