Vision language models have difficulty recognizing virtual objects

Open in new window