Modular Framework for Visuomotor Language Grounding