Explaining How Visual, Textual and Multimodal Encoders Share Concepts