Improving Unimodal Inference with Multimodal Transformers