A Vision-free Baseline for Multimodal Grammar Induction

Open in new window