Cross-modal Representation Flattening for Multi-modal Domain Generalization Yunfeng Fan 1