Brain encoding models based on multimodal transformers can transfer across language and vision