CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives