Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks