World-to-Words: Grounded Open Vocabulary Acquisition through Fast Mapping in Vision-Language Models