Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically

Open in new window