World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models

Open in new window