A multimodal developmental benchmark for language learning