Learning Syntax Without Planting Trees: Understanding When and Why Transformers Generalize Hierarchically