Impact of Pretraining Word Co-occurrence on Compositional Generalization in Multimodal Models