Text encoders bottleneck compositionality in contrastive vision-language models