Visually Grounded, Situated Learning in Neural Models