Causal disentanglement of multimodal data