Do Vision-Language Pretrained Models Learn Composable Primitive Concepts?