Do Vision-Language Models Understand Compound Nouns?