Differentiable Parsing and Visual Grounding of Natural Language Instructions for Object Placement