The Multi-Faceted Monosemanticity in Multimodal Representations