Does Data Scaling Lead to Visual Compositional Generalization?