Vision language models have difficulty recognizing virtual objects